Tuesday, April 07, 2009

Think Big

Suppose you have some command output consists of hexadecimal that require to be converted to decimal number, likely you will use bc to work it out
$ bc
ibase=16
ABCDEF
11259375
abcdef
syntax error on line 3, teletype
0089
137
^D

So if you want to make use of it in shell script, you need to do the following. Bear in mind that bc only accept uppercase hex number.

$ echo "ibase=16; ABCDEF" | bc
11259375

$ echo "ibase=16; abcdef" | bc
syntax error on line 1, teletype

$ echo "ibase=16; abcdef" | tr '[a-z]' '[A-Z]' | bc
syntax error on line 1, teletype
1123455

$ echo "abcdef" | tr '[a-z]' '[A-Z]' | xargs -I{} echo 'ibase=16;{}' | bc
11259375

This may work fine for small data set. However, if you need to loop through thousands of lines to do hex to dec convert, it becomes a performance problem. Below I am going to show you what are the alternatives.

Traditional shell script way. You can see how slow it is when we throw it with large data set.

$ cat hex.input
drive1 0089
drive2 0a2f
drive3 1FFE
drive4 980B
drive5 0011780c

$ cat hex2dec-sh.sh
#! /bin/sh


if [ $# -ne 1 ]; then
        echo "Usage: $0 <hex-input>"
        exit 1
fi


cat $1 | while read drive hex
do
        dec=`echo $hex | tr '[a-z]' '[A-Z]' | xargs -I{} echo "ibase=16; {}" | bc`
        echo "Drive=$drive, Hex=$hex, Dec=$dec"
done

$ time ./hex2dec-sh.sh hex.input
Drive=drive1, Hex=0089, Dec=137
Drive=drive2, Hex=0a2f, Dec=2607
Drive=drive3, Hex=1FFE, Dec=8190
Drive=drive4, Hex=980B, Dec=38923
Drive=drive5, Hex=0011780c, Dec=1144844

real    0m0.042s
user    0m0.011s
sys     0m0.065s

$ for i in `perl -e '$,=" "; print 1..1000'`
do
    cat hex.input
done > hex.big

$ wc -l hex.big
    5000 hex.big

$ time ./hex2dec-sh.sh hex.big > /dev/null

real    0m32.141s
user    0m9.131s
sys     0m56.458s

What if I have a few million lines to convert? You will need a lot of coffee breaks. The alternative is to use some high level scripting languages, such as Perl, Python, Tcl, ... In fact you can crank out your own function in AWK to do this kind of thing. Here I will show you a Perl one-liner and the AWK way.

Perl one-liner:

$ time perl -ne 'chomp();@l=split(/\s+/);print "Drive=",$l[0]," Hex=",$l[1]," Dec=",hex($l[1]),"\n"' < hex.input
Drive=drive1 Hex=0089 Dec=137
Drive=drive2 Hex=0a2f Dec=2607
Drive=drive3 Hex=1FFE Dec=8190
Drive=drive4 Hex=980B Dec=38923
Drive=drive5 Hex=0011780c Dec=1144844

real    0m0.008s
user    0m0.003s
sys     0m0.005s

$ time perl -ne 'chomp();@l=split(/\s+/);print "Drive=",$l[0]," Hex=",$l[1]," Dec=",hex($l[1]),"\n"' < hex.big  > /dev/null

real    0m0.044s
user    0m0.038s
sys     0m0.005s

AWK way:

$ cat hex2dec-awk.sh
#! /bin/sh


if [ $# -ne 1 ]; then
        echo "Usage: $0 <hex-input>"
        exit 1
fi


nawk '
function hex2dec(hex, h, i, factor, n, sum) {
        n = length(hex)
        factor = 1
        sum = 0
        for ( i=n ; i>0 ; --i ) {
                h = substr(hex, i, 1)
                if ( h == "a" || h == "A" ) { h=10 }
                if ( h == "b" || h == "B" ) { h=11 }
                if ( h == "b" || h == "B" ) { h=11 }
                if ( h == "c" || h == "C" ) { h=12 }
                if ( h == "d" || h == "D" ) { h=13 }
                if ( h == "e" || h == "E" ) { h=14 }
                if ( h == "f" || h == "F" ) { h=15 }
                sum += factor * h
                factor *= 16
        }
        return sum
}
{
        printf("Drive=%s, Hex=%s, Dec=%s\n", $1, $2, hex2dec($2))
}' $1

$ time ./hex2dec-awk.sh hex.input
Drive=drive1, Hex=0089, Dec=137
Drive=drive2, Hex=0a2f, Dec=2607
Drive=drive3, Hex=1FFE, Dec=8190
Drive=drive4, Hex=980B, Dec=38923
Drive=drive5, Hex=0011780c, Dec=1144844

real    0m0.008s
user    0m0.002s
sys     0m0.006s

$ time ./hex2dec-awk.sh hex.big > /dev/null 2>&1

real    0m0.100s
user    0m0.093s
sys     0m0.006s

For 5000 lines, you can reduce it from 32 seconds run time down to sub second. I am sure you can see the performance differences in the 3 implementations. Next time, think big! Bigger data size.

Labels: , , ,

0 Comments:

Post a Comment

<< Home