AWK Can Do Lookup
BEGIN
block.
I choose the web access log as an example and the lookup is based on the Hypertext Transfer Protocol -- HTTP/1.1 Status Code Definitions, eg, 200 -> OK
My initial version is based on some shell tricks which are very inefficient and error-prone. After browsing through the "The AWK Programming Language" (written by the AWK author - Alfred V. Aho, Peter J. Weinberger, and Brian W. Kerninghan), I am able to come up with this clean and readable code. Although the book was written in 1988, IMHO it is still the best book for AWK
#! /bin/sh if [ $# -ne 2 ]; then echo "Usage: $0 <lookup-file> <data-file>" exit 1 fi if [ ! -f $1 ]; then echo "Error. \"$1\" lookup file does not exit" exit 2 fi if [ ! -f $2 ]; then echo "Error. \"$2\" data file does not exit" exit 3 fi gawk ' BEGIN { # establish lookup while ( getline < "'$1'" > 0 ) { V=$2 for ( i=3 ; i<=NF ; ++i ) { V=sprintf("%s %s",V,$i) } L[$1]=V } } { # HTTP status code summary ++s[$9] } END { for ( i in s ) { printf("\"%s\" has %d counts\n", L[i], s[i]) } }' $2
See the lookup file and access log, and how the above script generates the lookup dynamically
$ cat lookup.txt 200 OK 201 Created 202 Accepted 203 Non Authoritative Information 204 No Content 205 Reset Content 206 Partial Content 300 Multiple Choices 301 Moved Permanently 302 Found 303 See Other 304 Not Modified 305 Use Proxy 306 Unused 307 Temporary Redirect 400 Bad Request 401 Unauthorized 402 Payment Required 403 Forbidden 404 Not Found 405 Method Not Allowed 406 Not Acceptable 407 Proxy Authentication Required 408 Request Timeout 409 Conflict 410 Gone 411 Length Required 412 Precondition Failed 413 Request Entity Too Large 414 Request URI Too Long 415 Unspported Media Type 416 Request Range Not Satisfiable 417 Expectation Failed 500 Internal Server Error 501 Not Implemented 502 Bad Gateway 503 Service Unavailable 504 Gateway Timeout 505 HTTP Version Not Supported $ head access_log 127.0.0.1 - - [01/Mar/2006:15:30:26 +0800] "GET / HTTP/1.1" 200 1456 127.0.0.1 - - [01/Mar/2006:15:30:26 +0800] "GET /apache_pb.gif HTTP/1.1" 200 2326 127.0.0.1 - - [01/Mar/2006:15:30:30 +0800] "GET /manual/ HTTP/1.1" 200 9187 127.0.0.1 - - [01/Mar/2006:15:30:30 +0800] "GET /manual/images/pixel.gif HTTP/1.1" 200 61 127.0.0.1 - - [01/Mar/2006:15:30:30 +0800] "GET /manual/images/apache_header.gif HTTP/1.1" 200 4084 127.0.0.1 - - [01/Mar/2006:15:30:30 +0800] "GET /manual/images/index.gif HTTP/1.1" 200 1540 127.0.0.1 - - [01/Mar/2006:15:30:38 +0800] "GET /manual/howto/cgi.html HTTP/1.1" 200 22388 127.0.0.1 - - [01/Mar/2006:15:30:38 +0800] "GET /manual/images/home.gif HTTP/1.1" 200 1465 127.0.0.1 - - [01/Mar/2006:15:30:38 +0800] "GET /manual/images/sub.gif HTTP/1.1" 200 6083 127.0.0.1 - - [01/Mar/2006:15:33:15 +0800] "GET /manual/howto/cgi.html HTTP/1.1" 200 22388 $ ./lookup.sh lookup.txt access_log "Not Modified" has 239 counts "Bad Request" has 1 counts "Unauthorized" has 18 counts "Forbidden" has 23 counts "OK" has 11378 counts "Not Found" has 3257 counts "Internal Server Error" has 4 counts "Bad Gateway" has 2 counts
Labels: awk, http, shell script