01 October 2008
NOTE: This is an old article I wrote in October 2008, it's still relevant today. It was originally posted on luckydonkey.com which I am in the process of retiring.
In order to test for a memory leak in New Metal Army I needed to replay my apache log files against my test server. Using Twill this was a doddle.
The only slightly icky thing about the script is the regular expression to parse a line from the apache log file (in Combined Log Format). I got this from RegExp Buddy (pretty much the only reason I run Windows nowadays) but I am sure you can get similar expressions from other regular expression libraries.
Anyway, I'm just chucking this out there incase someone else finds it useful.
#!/usr/bin/env python #Script to replay a file called apache.log against a server. import re import twill test_server="my.test.server.com" reobj = re.compile(r'^(?P\S+)\s+(?P \S+\s+\S+)\s+\[(?P [^]]+)\]\s+"(?:GET|POST|HEAD) (?P [^ ?"]+)\??(?P [^ ?"]+)? HTTP/[0-9.]+"\s+(?P [0-9]+)\s+(?P [-0-9]+)\s+"(?P [^"]*)"\s+"(?P [^"]*)"$', re.MULTILINE) browser = twill.get_browser() def filter_url(url): return False for line in open("apache.log"): match = reobj.search(line) if match is None: continue f = match.group("file") p = match.group("parameters") d = match.group("datetime") path = "?".join([f, p]) if p else f url = test_server+path if(filter_url(url)): continue try: print d, url browser.go(url) except ValueError: #this comes from twill parsing the HTML and it being malformed. #I don't really care about that, as long as I get the page. pass
Hope that helps someone.
comments powered by Disqus