Wednesday, July 23, 2008

I Am Naive In ...

Yesterday my colleague was telling me that my program did not work for a specific mapping in converting BMC Remedy CSV data. After understand the input mapping file, I realised that what he told me was incorrect. He told me that the mapping file data record is colon separated and each record consists of two fields. With this information in mind, I took a short cut in programming and did not check the integrity of the mapping file.

I admit I am naive in this task because I thought this is just a a throw away solution and therefore no need to bother about rigorous testing. Obviously I was wrong. Anyway, since I am learning Python, I just wonder whether Python can catch this type of exception. Below showed code snippet how I did it Tcl. If I were to use the "id" after the foreach loop, the "id" is no longer "CCH" as it should be for the bad data set.

$ tclsh
% set gooddata {CCH:Chan Chi Hung Pte Ltd}
CCH:Chan Chi Hung Pte Ltd

% set badData {CCH:Chan Chi Hung Pte Ltd, Tel:1234567}
CCH:Chan Chi Hung Pte Ltd, Tel:1234567

% foreach { id value } [split $goodData :] {}

% puts "id=$id, value=$value"
id=CCH, value=Chan Chi Hung Pte Ltd

% foreach { id value } [split $badData :] {}

% puts "id=$id, value=$value"
id=1234567, value=

% 
In order to split into just two fileds based on the first colon, I need to use non-greedy quantifiers in the regular expression syntax, which match the same possibilities but prefer the smallest number rather than the largest number of matches.
% regexp {^(.*?):(.*)$} $badData x id value
1

% puts "id=$id, value=$value"
id=CCH, value=Chan Chi Hung Pte Ltd, Tel:1234567
%

With Python, it throws exception if you are not getting 2 strings in the return list. DO you know that split method for string can limit the amount of separator fields to be splitted. This gives you the flexibility to choose how you want to split the string.

$ python
Python 2.5.1 (r251:54863, May 18 2007, 16:56:43)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> goodData='CCH:Chan Chi Hung Pte Ltd'

>>> badData='CCH:Chan Chi Hung Pte Ltd, Tel: 1234567'

>>> [id,value]=goodData.split(':')

>>> print 'id=%s, value=%s' % (id,value)
id=CCH, value=Chan Chi Hung Pte Ltd

>>> [id,value]=badData.split(':')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: too many values to unpack

>>> [id,value]=badData.split(':',1)

>>> print 'id=%s, value=%s' % (id,value)
id=CCH, value=Chan Chi Hung Pte Ltd, Tel: 1234567

>>>

Moral of the story is:
Do not trust your users totally.
Make sure you test your program with all kinds of data set to ensure it can handle various situation.
Program in defensive manner, even though it is a throw away solution.

Labels: ,

0 Comments:

Post a Comment

<< Home