HTTP State Codes Summary, The AWK Way
His code is something like this
$ awk '{++s[$(NF-1)]}END{for(i in s){print i,s[i]}}' access_log | sort
200 916952
302 10031
304 265012
400 22
401 323
404 253
500 3048
This may serve his purpose for checking. However, I think it is possible to write an entire HTTP status code summary using gawk to present the summary in a per-day basis. "asort" (Sorting Array Values and Indices) function in gawk is very handy in sorting array so that rows and columns can be displayed in order. Below is my implementation:
$ cat ncode.sh
#! /bin/bash
if [ $# -ne 1 ]; then
echo "Usage: $0 <access_log>"
echo " <access_log> can be either plain text or gzip compressed"
exit 1
fi
log=$1
if [ ! -f "$log" ]; then
echo "Error. $log does not exist"
exit 2
fi
file $log | grep "gzip compressed data" > /dev/null 2>&1
if [ $? -eq 0 ]; then
cmd="zcat"
else
cmd="cat"
fi
$cmd $log | gawk '
function separator(n)
{
for ( i=1 ; i<=n ; ++i ) {
printf("-")
}
printf("\n")
}
$(NF-1)>=100 && $(NF-1)<=505 {
date=substr($4,2,11)
code=$(NF-1)
a_code[code]=code
a_date[date]=date
a_cd[date,code]++
}
END {
nc=asort(a_code)
nd=asort(a_date)
separator(80)
# header for http code
printf("HTTP Codes:")
for ( c=1 ; c<=nc ; ++c ) {
printf("%8d", a_code[c])
}
printf(" Total\n")
separator(80)
# result per date
for ( d=1 ; d<=nd ; ++d ) {
printf("%s", a_date[d])
total=0
for ( c=1 ; c<=nc ; ++c ) {
value=a_cd[a_date[d],a_code[c]]
printf("%8d", value)
total+=value
}
printf("%8d\n", total)
}
separator(80)
# total by code
printf("Total: ")
all=0
for ( c=1 ; c<=nc ; ++c ) {
total=0
for ( d=1 ; d<=nd ; ++d ) {
value=a_cd[a_date[d],a_code[c]]
total+=value
}
all+=total
printf("%8d", total)
}
printf("%8d\n", all)
separator(80)
}'
It took just under 15 seconds on my notebook (Intel Celeron 1.4GHz, 512 MB memory) to summarise 1,192,178 lines of web access log with gawk 3.1.6 in Cygwin.
$ ./ncode.sh
Usage: ./ncode.sh <access_log>
<access_log> can be either plain text or gzip compressed
$ ./ncode.sh access_log.gz
--------------------------------------------------------------------------------
HTTP Codes: 200 302 304 400 401 404 500 Total
--------------------------------------------------------------------------------
01/Jan/2008 22038 8 290 0 2 0 0 22338
02/Jan/2008 30732 499 11427 0 14 10 100 42782
03/Jan/2008 31988 529 11718 0 14 6 203 44458
04/Jan/2008 23525 81 2199 0 3 2 1 25811
05/Jan/2008 21865 1 246 1 0 1 0 22114
06/Jan/2008 29891 184 7874 2 0 5 60 38016
07/Jan/2008 30866 370 10107 4 11 8 23 41389
08/Jan/2008 32001 608 12380 1 24 22 67 45103
09/Jan/2008 33043 586 14069 0 42 11 151 47902
10/Jan/2008 34076 438 12374 0 28 14 128 47058
11/Jan/2008 23703 63 2604 0 0 1 5 26376
12/Jan/2008 21811 17 393 0 1 1 3 22226
13/Jan/2008 30458 341 7867 1 18 12 89 38786
14/Jan/2008 32659 348 10302 0 9 11 65 43394
15/Jan/2008 34758 539 13515 2 15 13 79 48921
16/Jan/2008 32477 457 13728 0 22 11 924 47619
17/Jan/2008 33215 406 10919 0 15 8 75 44638
18/Jan/2008 23717 90 1275 0 0 3 80 25165
19/Jan/2008 21947 0 42 0 0 0 54 22043
20/Jan/2008 33129 378 11618 0 15 24 102 45266
21/Jan/2008 32149 493 14163 0 8 18 78 46909
22/Jan/2008 34153 477 13045 1 9 9 82 47776
23/Jan/2008 32234 312 10560 0 5 6 77 43194
24/Jan/2008 34076 533 12402 10 12 4 70 47107
25/Jan/2008 23917 98 1724 0 4 2 106 25851
26/Jan/2008 22046 0 6 0 0 1 46 22099
27/Jan/2008 32851 329 12652 0 17 8 58 45915
28/Jan/2008 36528 447 14036 0 6 8 83 51108
29/Jan/2008 36627 664 14179 0 9 18 97 51594
30/Jan/2008 33731 522 10825 0 8 10 87 45183
31/Jan/2008 20741 213 6473 0 12 6 55 27500
--------------------------------------------------------------------------------
Total: 916952 10031 265012 22 323 253 3048 1195641
--------------------------------------------------------------------------------
You must be wondering why I need to 'reinvent the wheel' when there are free open source tools (eg. Analog, AWStats, Webalizer, ... ) that can do a much better job because I believe
"I hear and I forget; I see and I remember; I do and I understand"
- Chinese Proverb
"Willing is not enough; we must do. Knowing is not enough; we must apply."The equivalent in the IT world will be
- Bruce Lee
"I install and I am just an Installer; I use and I am just a User; I write and I am proud to call myself a Software Engineer"
- Chihung's proverb, hopefully someone will quote it in the future :-)
Labels: awk, Cygwin, shell script


0 Comments:
Post a Comment
<< Home