Saturday, November 29, 2008

Avoid Using Temporary Files

If your style of writing shell script is usually based on outputting stuff to temporary files for future processing, I can tell you that in most cases it is possible to do the same job without writing anything to the OS simply by using some UNIX shell tricks. By doing this, your script will be more efficient and more portable.

Suppose you have two functions (or commands) that produce SAME number of lines of output and you want to 'paste' the two output together. The easiest way out is to store the output in separate files. In this blog, I will introduce two functions, namely calc1 (calculate n*n) and calc2 (calculate n*n*n).

$ cat t0.sh
#! /bin/sh

PATH=/usr/bin:/bin

seq()
{
        nawk -v start=$1 -v end=$2 '
                END {for(i=start;i<=end;++i){print i}}' /dev/null
}
calc1()
{
        for i in `seq $1 $2`
        do
                echo `expr $i \* $i`
        done
}
calc2()
{
        for i in `seq $1 $2`
        do
                echo `expr $i \* $i \* $i`
        done
}

calc1 1 10 > sometempfile1
calc2 1 10 > sometempfile2
paste sometempfile1 sometempfile2
rm -f sometempfile1 sometempfile2

$ ./t0.sh
1       1
4       8
9       27
16      64
25      125
36      216
49      343
64      512
81      729
100     1000

In this scenario, the output from calc1 and calc2 are having the same number of records. We can simply take advantage of this by combining the output using UNIX sub-shell and have the output to be handled by AWK. In the AWK, I will store the output in an associative array (line) based on the record number (NR) and the array will be processed at the END block.

$ cat t1.sh
#! /bin/sh

PATH=/usr/bin:/bin

seq()
{
        nawk -v start=$1 -v end=$2 '
                END {for(i=start;i<=end;++i){print i}}' /dev/null
}
calc1()
{
        for i in `seq $1 $2`
        do
                echo `expr $i \* $i`
        done
}
calc2()
{
        for i in `seq $1 $2`
        do
                echo `expr $i \* $i \* $i`
        done
}

# using sub shell to group the output
( calc1 1 10 ; calc2 1 10) | \
nawk '
{ line[NR]=$0 }
END {
        for(i=1;i<=NR/2;++i) {
                print line[i] "\t" line[NR/2+i]
        }
}'

$ ./t1.sh
1       1
4       8
9       27
16      64
25      125
36      216
49      343
64      512
81      729
100     1000

As I mentioned earlier on, one can accomplish the same task without temporary files.

Wait, the task has not finished yet. What if the output records are not the same ?

( calc1 1 10 ; calc2 1 13 ) | ...
Obviously the second script will break. Can you fix it for me ? Do give it a try and I will provide my solution in a couple of days time.

Labels: ,

0 Comments:

Post a Comment

<< Home