├── README.md └── apache_parser.py /README.md: -------------------------------------------------------------------------------- 1 | # apache_parser.py 2 | 3 | This is a Python script and command-line tool, 4 | with no dependencies, that allows parsing data 5 | from Apache log files. 6 | 7 | At this time it supports generating supports for: 8 | 9 | * **uri** - pageviews for each uri 10 | * **time** - datetime with highest request/second 11 | * **status_code** - hits for each http status code 12 | * **referral** - uris of referring sites 13 | * **agent** - hits for each user agent 14 | * **subscriptions** - the number of feed subscribers per uri. 15 | This is done by parsing user agents for their subscriber count. 16 | 17 | ## Usage 18 | 19 | Here are some example uses: 20 | 21 | python parser.py access.log subscriptions 22 | python parser.py access.log uri --quantity 5 23 | python parser.py access.log agent --cutoff 100 24 | 25 | There is help available at the command-line as well. 26 | 27 | python parser.py --help 28 | 29 | 30 | ## User agents successfully parsed for feed subscribers 31 | 32 | These are the feeds that have been tested against 33 | the feed subscription system: 34 | 35 | Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 3 subscribers; feed-id=7675226481817637975) 36 | Netvibes (http://www.netvibes.com/; 5 subscribers; feedId: 5404723) 37 | Bloglines/3.1 (http://www.bloglines.com; 1 subscriber) 38 | NewsGatorOnline/2.0 (http://www.newsgator.com; 1 subscribers) 39 | Zhuaxia.com 1 Subscribers 40 | AideRSS/1.0 (aiderss.com); 2 subscribers 41 | xianguo-rssbot/0.1 (http://www.xianguo.com/; 1 subscribers) 42 | Fastladder FeedFetcher/0.01 (http://fastladder.com/; 1 subscriber) 43 | HanRSS/1.1 (http://www.hanrss.com; 1 subscriber) 44 | livedoor FeedFetcher/0.01 (http://reader.livedoor.com/; 1 subscriber) 45 | 46 | 47 | ## Credits 48 | 49 | This script draws inspiration and code from: 50 | 51 | * http://effbot.org/zone/wide-finder.htm 52 | * http://www.python.org/dev/peps/pep-0265/ 53 | -------------------------------------------------------------------------------- /apache_parser.py: -------------------------------------------------------------------------------- 1 | import re, operator 2 | from optparse import OptionParser 3 | from operator import itemgetter 4 | 5 | def restrict(lst, cutoff, count): 6 | 'Restrict the list by minimum value or count.' 7 | if cutoff: 8 | lst = (x for x in lst if x[1] > cutoff) 9 | if count: 10 | lst = lst[:count] 11 | return lst 12 | 13 | def parse(filename): 14 | 'Return tuple of dictionaries containing file data.' 15 | def make_entry(x): 16 | return { 17 | 'server_ip':x.group('ip'), 18 | 'uri':x.group('uri'), 19 | 'time':x.group('time'), 20 | 'status_code':x.group('status_code'), 21 | 'referral':x.group('referral'), 22 | 'agent':x.group('agent'), 23 | } 24 | log_re = '(?P[.:0-9a-fA-F]+) - - \[(?P