Monday, June 16, 2008

My First Python Program

This is my second attempt in trying to learn Python since 2000. You may be wondering what's the motivation behind it and whether I will "dump" my favourite scripting, Tcl, to go full steam with Python. Tcl is still my "mother tongue" and definitely no harm to learn another "foreign language".

The motivation comes from "The Zen of Python" and the way they do multi-precision integer calculation. Below shows Python in action and compare with Perl (with Bignum module) & UNIX bc:

$ /cygdrive/c/Python25/python
Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
>>> 2**1000

$ perl -v

This is perl, v5.8.8 built for cygwin-thread-multi-64int
(with 8 registered patches, see perl -V for more detail)

Copyright 1987-2006, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at, the Perl Home Page.

$ echo "use Bignum; print 2**1000" | perl

$ echo "2^1000" | bc

Recently my colleague passed me a pretty big (170MB in size, 227K lines) IIS log file and I thought this is a good time to practice my Python skill. BTW, this is my first not so trivial Python program. The objective of the program is to work out the hourly byte sent, byte received and hits. Also, I wanted to compare Python with AWK and Tcl (8.4.12).

Here is the "battle field" for Python vs Tcl vs AWK in my Cygwin. To be fair, each program will be executed 3 times.

$ time ./ iis.log > a

real    0m14.672s
user    0m0.015s
sys     0m0.015s

$ time ./ iis.log > a

real    0m15.391s
user    0m0.031s
sys     0m0.031s

$ time ./ iis.log > a

real    0m15.094s
user    0m0.015s
sys     0m0.031s

$ time ./ iis.log > b

real    0m18.704s
user    0m15.170s
sys     0m0.373s

$ time ./ iis.log > b

real    0m18.219s
user    0m14.951s
sys     0m0.233s

$ time ./ iis.log > b

real    0m18.390s
user    0m14.873s
sys     0m0.483s

$ time ./sum.tcl iis.log > c

real    0m15.781s
user    0m0.015s
sys     0m0.015s

$ time ./sum.tcl iis.log > c

real    0m14.641s
user    0m0.015s
sys     0m0.000s

$ time ./sum.tcl iis.log > c

real    0m15.031s
user    0m0.015s
sys     0m0.000s

# verify the output are the same
# btw, python and tcl treated the default end of line to be CRLF (native platform is Windows)
$ for i in a b c
dos2unix < a | md5sum
83211bf4faa32495ca9eb52c6b520974 *-
83211bf4faa32495ca9eb52c6b520974 *-
83211bf4faa32495ca9eb52c6b520974 *-

It is clear the both Python and Tcl come in neck to neck. A comprehesive scripting language like Python and Tcl is definitely more versatile than a specific tool such as AWK. Below are the source codes for the various programs in case you are interested in the details:

$ cat
#! /cygdrive/c/Python25/python

import sys

if len(sys.argv) != 2:
        print "Usage:", sys.argv[0], ""

for i in range(24):
        index='%02d' % i

while line:
        sc[hour] += int(fields[18])
        cs[hour] += int(fields[19])
        cnt[hour] += 1

for i in k:
        print i,sc[i],cs[i],cnt[i]

$ cat
#! /bin/sh

if [ $# -ne 1 ]; then
        echo "Usage: $0 <input-log>"
        exit 1

awk '
        for ( h=0 ; h<24 ; ++h ) {
                print hh, sc[hh], cs[hh], hit[hh]
}' $1

$ cat sum.tcl
#! /cygdrive/c/ActiveTcl/

if { $argc != 1 } {
        puts stderr "Usage: $argv0 "
        exit 1
set logfile [lindex $argv 0]
if { ![file exists $logfile] } {
        puts stderr "Error. $logfile does not exist"
        exit 2

# initialise to 0
set hours {}
for { set h 0 } { $h < 24 } { incr h } {
        lappend hours [format {%02d} $h]
foreach hr $hours {
        set sc($hr) 0
        set cs($hr) 0
        set hit($hr) 0

set fp [open $logfile r]
while { [gets $fp line] >= 0 } {
        set time [lindex $line 1]
        set hr [lindex [split $time :] 0]
        incr sc($hr) [lindex $line 18]
        incr cs($hr) [lindex $line 19]
        incr hit($hr)
close $fp

foreach hr $hours {
        puts "$hr $sc($hr) $cs($hr) $hit($hr)"

I just covered 200 pages (out of 746 pages) of the Learning Python, 3rd Edition and hope to explore more features as I go into the details. So far, I particularly like the feature-rich OO methods available in their core objects. However, I still have not figure out how to differentiate between attribute and method of an object.

Labels: , , ,


Post a Comment

<< Home