HTTP State Codes Summary, The AWK Way
His code is something like this
$ awk '{++s[$(NF-1)]}END{for(i in s){print i,s[i]}}' access_log | sort 200 916952 302 10031 304 265012 400 22 401 323 404 253 500 3048
This may serve his purpose for checking. However, I think it is possible to write an entire HTTP status code summary using gawk to present the summary in a per-day basis. "asort" (Sorting Array Values and Indices) function in gawk is very handy in sorting array so that rows and columns can be displayed in order. Below is my implementation:
$ cat ncode.sh #! /bin/bash if [ $# -ne 1 ]; then echo "Usage: $0 <access_log>" echo " <access_log> can be either plain text or gzip compressed" exit 1 fi log=$1 if [ ! -f "$log" ]; then echo "Error. $log does not exist" exit 2 fi file $log | grep "gzip compressed data" > /dev/null 2>&1 if [ $? -eq 0 ]; then cmd="zcat" else cmd="cat" fi $cmd $log | gawk ' function separator(n) { for ( i=1 ; i<=n ; ++i ) { printf("-") } printf("\n") } $(NF-1)>=100 && $(NF-1)<=505 { date=substr($4,2,11) code=$(NF-1) a_code[code]=code a_date[date]=date a_cd[date,code]++ } END { nc=asort(a_code) nd=asort(a_date) separator(80) # header for http code printf("HTTP Codes:") for ( c=1 ; c<=nc ; ++c ) { printf("%8d", a_code[c]) } printf(" Total\n") separator(80) # result per date for ( d=1 ; d<=nd ; ++d ) { printf("%s", a_date[d]) total=0 for ( c=1 ; c<=nc ; ++c ) { value=a_cd[a_date[d],a_code[c]] printf("%8d", value) total+=value } printf("%8d\n", total) } separator(80) # total by code printf("Total: ") all=0 for ( c=1 ; c<=nc ; ++c ) { total=0 for ( d=1 ; d<=nd ; ++d ) { value=a_cd[a_date[d],a_code[c]] total+=value } all+=total printf("%8d", total) } printf("%8d\n", all) separator(80) }'
It took just under 15 seconds on my notebook (Intel Celeron 1.4GHz, 512 MB memory) to summarise 1,192,178 lines of web access log with gawk 3.1.6 in Cygwin.
$ ./ncode.sh Usage: ./ncode.sh <access_log> <access_log> can be either plain text or gzip compressed $ ./ncode.sh access_log.gz -------------------------------------------------------------------------------- HTTP Codes: 200 302 304 400 401 404 500 Total -------------------------------------------------------------------------------- 01/Jan/2008 22038 8 290 0 2 0 0 22338 02/Jan/2008 30732 499 11427 0 14 10 100 42782 03/Jan/2008 31988 529 11718 0 14 6 203 44458 04/Jan/2008 23525 81 2199 0 3 2 1 25811 05/Jan/2008 21865 1 246 1 0 1 0 22114 06/Jan/2008 29891 184 7874 2 0 5 60 38016 07/Jan/2008 30866 370 10107 4 11 8 23 41389 08/Jan/2008 32001 608 12380 1 24 22 67 45103 09/Jan/2008 33043 586 14069 0 42 11 151 47902 10/Jan/2008 34076 438 12374 0 28 14 128 47058 11/Jan/2008 23703 63 2604 0 0 1 5 26376 12/Jan/2008 21811 17 393 0 1 1 3 22226 13/Jan/2008 30458 341 7867 1 18 12 89 38786 14/Jan/2008 32659 348 10302 0 9 11 65 43394 15/Jan/2008 34758 539 13515 2 15 13 79 48921 16/Jan/2008 32477 457 13728 0 22 11 924 47619 17/Jan/2008 33215 406 10919 0 15 8 75 44638 18/Jan/2008 23717 90 1275 0 0 3 80 25165 19/Jan/2008 21947 0 42 0 0 0 54 22043 20/Jan/2008 33129 378 11618 0 15 24 102 45266 21/Jan/2008 32149 493 14163 0 8 18 78 46909 22/Jan/2008 34153 477 13045 1 9 9 82 47776 23/Jan/2008 32234 312 10560 0 5 6 77 43194 24/Jan/2008 34076 533 12402 10 12 4 70 47107 25/Jan/2008 23917 98 1724 0 4 2 106 25851 26/Jan/2008 22046 0 6 0 0 1 46 22099 27/Jan/2008 32851 329 12652 0 17 8 58 45915 28/Jan/2008 36528 447 14036 0 6 8 83 51108 29/Jan/2008 36627 664 14179 0 9 18 97 51594 30/Jan/2008 33731 522 10825 0 8 10 87 45183 31/Jan/2008 20741 213 6473 0 12 6 55 27500 -------------------------------------------------------------------------------- Total: 916952 10031 265012 22 323 253 3048 1195641 --------------------------------------------------------------------------------
You must be wondering why I need to 'reinvent the wheel' when there are free open source tools (eg. Analog, AWStats, Webalizer, ... ) that can do a much better job because I believe
"I hear and I forget; I see and I remember; I do and I understand"
- Chinese Proverb
"Willing is not enough; we must do. Knowing is not enough; we must apply."The equivalent in the IT world will be
- Bruce Lee
"I install and I am just an Installer; I use and I am just a User; I write and I am proud to call myself a Software Engineer"
- Chihung's proverb, hopefully someone will quote it in the future :-)
Labels: awk, Cygwin, shell script
0 Comments:
Post a Comment
<< Home