Thursday, July 10, 2008

Tcl Code Refactoring

My colleague is doing the BMC Remedy migration and has dumped out the data from the old version in CSV format. Some of the migration requirements are that certain fields used to be optional and now have to be mandatory. Also, the user IDs have to be replaced by real user name. Just to name a few.

It is not difficult to parse the mapping file to store that in Tcl associate array so that it can be used for dynamic user name substitution.

CSV module from Tcllib proved to be extremely useful to parse CSV output. To ensure the mapping work properly, I need to dynamically generate the switch body to find out whether I need to substitute the user ID to real user name or set the default email address / telephone if it is blank. Why I need to do that dynamically because the switch pattern cannot work with variable substitution. However, it is very inefficient to build the Tcl code dynamically every time within the while loop.

It took 10 seconds to manipulate a 554 rows x 285 columns CSV file. Definitely I am not satify with the run time and I am sure Tcl can do better than that. It is code refactoring time. By taking the switch body out of the loop and have it dynamically generated using subst, we can avoid a lot of computation in building that part of code over and over again. Also, we can collapse all the matching cases into a single command body using the "-" trick in switch to avoid repeating code. Below is a code snippet:

set switchBody [subst -nocommands {
 $index(email) { 
  if { [string length \$cell] == 0 } {
   set cell $defaultEmail
  }
 }
 $index(telephone) { 
  if { [string length \$cell] == 0 } {
   set cell $defaultTelephone
  }
 }
 $index(assigned) -
 $index(closed) -
 $index(fixed) -
 $index(response) { 
  if { [info exists map(\$cell)] == 1 } {
   set cell \$map(\$cell)
  }
 }
}]

...

while { [gets $fp line] >= 0 } {
 set lcsv [::csv::split -alternate $line]
 set lcsvN [llength $lcsv]
 set new {}
 for { set i 0 } { $i < $lcsvN } { incr i } {
  set cell [lindex $lcsv $i]
  switch $i $switchBody
  lappend new $cell
 }
 puts [::csv::join $new]
}

Now the run time is down to 3.8 seconds and that is 2.5 times more efficient. I may have to tune this code further when my colleague provide me the real data source with few hundred thousand records.

Labels:

0 Comments:

Post a Comment

<< Home