My First Python Program
The motivation comes from "The Zen of Python" and the way they do multi-precision integer calculation. Below shows Python in action and compare with Perl (with Bignum module) & UNIX bc:
$ /cygdrive/c/Python25/python Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import this The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! >>> 2**1000 10715086071862673209484250490600018105614048117055336074437503883703510511249361 22493198378815695858127594672917553146825187145285692314043598457757469857480393 45677748242309854210746050623711418779541821530464749835819412673987675591655439 46077062914571196477686542167660429831652624386837205668069376L >>>exit() $ perl -v This is perl, v5.8.8 built for cygwin-thread-multi-64int (with 8 registered patches, see perl -V for more detail) Copyright 1987-2006, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using "man perl" or "perldoc perl". If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. $ echo "use Bignum; print 2**1000" | perl 1.07150860718627e+301 $ echo "2^1000" | bc 10715086071862673209484250490600018105614048117055336074437503883703\ 51051124936122493198378815695858127594672917553146825187145285692314\ 04359845775746985748039345677748242309854210746050623711418779541821\ 53046474983581941267398767559165543946077062914571196477686542167660\ 429831652624386837205668069376
Recently my colleague passed me a pretty big (170MB in size, 227K lines) IIS log file and I thought this is a good time to practice my Python skill. BTW, this is my first not so trivial Python program. The objective of the program is to work out the hourly byte sent, byte received and hits. Also, I wanted to compare Python with AWK and Tcl (8.4.12).
Here is the "battle field" for Python vs Tcl vs AWK in my Cygwin. To be fair, each program will be executed 3 times.
$ time ./sum.py iis.log > a real 0m14.672s user 0m0.015s sys 0m0.015s $ time ./sum.py iis.log > a real 0m15.391s user 0m0.031s sys 0m0.031s $ time ./sum.py iis.log > a real 0m15.094s user 0m0.015s sys 0m0.031s $ time ./sum.sh iis.log > b real 0m18.704s user 0m15.170s sys 0m0.373s $ time ./sum.sh iis.log > b real 0m18.219s user 0m14.951s sys 0m0.233s $ time ./sum.sh iis.log > b real 0m18.390s user 0m14.873s sys 0m0.483s $ time ./sum.tcl iis.log > c real 0m15.781s user 0m0.015s sys 0m0.015s $ time ./sum.tcl iis.log > c real 0m14.641s user 0m0.015s sys 0m0.000s $ time ./sum.tcl iis.log > c real 0m15.031s user 0m0.015s sys 0m0.000s # verify the output are the same # btw, python and tcl treated the default end of line to be CRLF (native platform is Windows) $ for i in a b c do dos2unix < a | md5sum done 83211bf4faa32495ca9eb52c6b520974 *- 83211bf4faa32495ca9eb52c6b520974 *- 83211bf4faa32495ca9eb52c6b520974 *-
It is clear the both Python and Tcl come in neck to neck. A comprehesive scripting language like Python and Tcl is definitely more versatile than a specific tool such as AWK. Below are the source codes for the various programs in case you are interested in the details:
$ cat sum.py #! /cygdrive/c/Python25/python import sys if len(sys.argv) != 2: print "Usage:", sys.argv[0], "" exit(1) sc={} cs={} cnt={} for i in range(24): index='%02d' % i sc[index]=0 cs[index]=0 cnt[index]=0 file=open(sys.argv[1],'r') line=file.readline() while line: fields=line.split() times=fields[1].split(':') hour=times[0] sc[hour] += int(fields[18]) cs[hour] += int(fields[19]) cnt[hour] += 1 line=file.readline() file.close() k=sc.keys() k.sort() for i in k: print i,sc[i],cs[i],cnt[i] $ cat sum.sh #! /bin/sh if [ $# -ne 1 ]; then echo "Usage: $0 <input-log>" exit 1 fi awk ' { split($2,t,":") hr=t[1] sc[hr]+=$19 cs[hr]+=$20 hit[hr]++ } END { for ( h=0 ; h<24 ; ++h ) { hh=sprintf("%02d",h) print hh, sc[hh], cs[hh], hit[hh] } }' $1 $ cat sum.tcl #! /cygdrive/c/ActiveTcl/8.4.12.0/bin/tclsh if { $argc != 1 } { puts stderr "Usage: $argv0 " exit 1 } set logfile [lindex $argv 0] if { ![file exists $logfile] } { puts stderr "Error. $logfile does not exist" exit 2 } # initialise to 0 set hours {} for { set h 0 } { $h < 24 } { incr h } { lappend hours [format {%02d} $h] } foreach hr $hours { set sc($hr) 0 set cs($hr) 0 set hit($hr) 0 } set fp [open $logfile r] while { [gets $fp line] >= 0 } { set time [lindex $line 1] set hr [lindex [split $time :] 0] incr sc($hr) [lindex $line 18] incr cs($hr) [lindex $line 19] incr hit($hr) } close $fp foreach hr $hours { puts "$hr $sc($hr) $cs($hr) $hit($hr)" }
I just covered 200 pages (out of 746 pages) of the Learning Python, 3rd Edition and hope to explore more features as I go into the details. So far, I particularly like the feature-rich OO methods available in their core objects. However, I still have not figure out how to differentiate between attribute and method of an object.
0 Comments:
Post a Comment
<< Home