├── script.awk ├── apacheawk-sh ├── test └── README.creole /script.awk: -------------------------------------------------------------------------------- 1 | BEGIN { FPAT="([^ ]+)|(\"[^\"]+\")|(\\[[^\\]]+\\])" } 2 | -------------------------------------------------------------------------------- /apacheawk-sh: -------------------------------------------------------------------------------- 1 | alias apacheawk="gawk -vFPAT='([^ ]+)|(\"[^\"]+\")|(\\\\[[^\\\\]]+\\\\])' " 2 | -------------------------------------------------------------------------------- /test: -------------------------------------------------------------------------------- 1 | 127.0.0.1 - - [16/Aug/2014:20:47:29 +0100] "GET /manual/elisp/index.html HTTP/1.1" 200 37230 "http://testlocalhost/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0" 2 | -------------------------------------------------------------------------------- /README.creole: -------------------------------------------------------------------------------- 1 | = awk hack to aid hacking access logs = 2 | 3 | Isn't access-log parsing frustrating? 4 | 5 | This repo vc's a nice little GAWK hack that let's you parse files with 6 | complex fields, such as access logs. 7 | 8 | The hack is in a GAWK script as well as a shell script where it 9 | defines an alias that you can use like this: 10 | 11 | {{{ 12 | $ . apacheawk-sh 13 | $ apacheawk '$6 ~ /200/ {print $5}' /var/log/nginx/access.log.1 | sort | uniq 14 | "GET / HTTP/1.1" 15 | "GET /manual/elisp/index.html HTTP/1.1" 16 | "GET /manual/elisp/Index.html HTTP/1.1" 17 | "GET /scripts/app.js HTTP/1.1" 18 | "GET /style.css HTTP/1.1" 19 | }}} 20 | 21 | 22 | This is all thanks to the 23 | very 24 | [[http://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html|clever AWK extension]] 25 | which allows the FPAT variable to be defined: 26 | 27 | 28 | === some history === 29 | 30 | When I first got totally hacked off by this problem, sometime in the 31 | 1990s, I wrote a patch to gawk so that it would consume whole strings 32 | if you told it too. And it could also be told what the string 33 | delimmeter was. 34 | 35 | Arnold Robbins rejected it in favour of this FPAT thing. It has seemed 36 | to take a long time for the FPAT thing to come to fruition. But it's 37 | finally here and now this problem is solved. 38 | 39 | Proof that if you wait long enough your patches become irrelevant. 40 | 41 | 42 | === stack overflow === 43 | 44 | stack overflow kinda made me make this a repo. 45 | 46 | [[http://serverfault.com/a/623891/109647|Here's the link to my answer]]. --------------------------------------------------------------------------------