Friday, July 11, 2008

Python Runs A Lot Faster Than Tcl

As I mentioned in this morning blog that I will take on this task in Python. Guess what, Python runs 7 times fast than Tcl on the same small subset of CSV dataset. My colleague was so impressed when he applied the program with the real dataset of 190+MB CSV file. It took just under 15 seconds to do all the conversions and create a new CSV file. BTW, it took more than 15-20 seconds to load that CSV file into Excel.

With the charactistics of list being mutable in Python, it has very little penalty in changing the content of individual item in the list. Python does not have to re-create a separate list object when the content is changed, whereas Tcl has to do that. In Tcl, performance will definitely deteriorate when we have to deal with long list.

Here is my second Python code snippet, not very fantastic but it works and run fast. I am very impressed with Python's performance. BTW, the exception handling ( try: except:) in Python has less CPU overhead and finer control than Tcl's catch

import csv, sys

if len(sys.argv) != 4:
 print "Usage:", sys.argv[0], "csv(in)", "mapping", "csv(out)"
 exit(1)


#
# mapping as dict object
#
map={}
for line in open(sys.argv[2],"r"):
 k,v=line.rstrip().split(":")
 map[k]=v


reader=csv.reader(open(sys.argv[1],"rb"))
writer=csv.writer(open(sys.argv[3],"wb"))


#
# find header indices
#
header=reader.next()
i_email    =header.index("Email Address")
i_telephone=header.index("Telephone")
i_assigned =header.index("* Assigned-to (Person)")
....

writer.writerow(header)
for row in reader:
 if row[i_email] == "":
  row[i_email]="default@somewhere"
 if row[i_telephone] == "":
  row[i_telephone]="123456789"

 try:
  row[i_assigned]=map[row[i_assigned]]
 except:
  row[i_assigned]=""

....

 writer.writerow(row)

While I am still trying to finish the Learning Python, 3rd Ed book (only managed to finish half of it), I am always looking for opportunities to apply what I learn.

Labels: , ,

0 Comments:

Post a Comment

<< Home