Saturday, January 24, 2009

Shell Script Performance, Part 3

Real cases/scenarios are some time hard to come by and definitely you want to be prepared when you are in such a situation. It is good that you can create such scenario yourself. In my Part 2 regarding shell script performance, I asked my audiences to come up with the fastest way to create 1000 files. of course you do not want to wait for minutes or hours for just 1000 files. What if you want to test on 10,000 or even 100,000 files ?

With that many files in your directory, you can practice your shell scripting by changing the prefix, suffix, extension, padding with zeros, ..., the amount of things you can script is endless.

In Part 1, we understand that process creation is a very expensive task and we should avoid running too many commands, especially within a loop. So I am not going to use the traditional approach like this one:
count=1; while [ $count -le 100 ]; do touch file-$count.txt; count=`expr $count + 1`; done

Below shows you various ways to 'skin a cat':

  1. - use 'seq' to generate 1 to 1000 number to be used in a for loop and each loop create a new file
  2. - use Bash shell built-in capability to do brace expansion
  3. - use 'seq -f' to generate the file name and supply that to touch to create all the files. [Thanks to comment by pjz]
  4. create.tcl - a Tcl implementation
  5. - a Python implementation

And the run time:

  1. - 2m36.452s
  2. - 0m2.371s
  3. - 0m2.075s
  4. create.tcl - 0m7.831s
  5. - 0m2.403s
As you can see, the less commands you used the shortest the run time. In this case, pjz won!. In, I introduced Bash shell brace expansion which is a very useful feature in doing command expansion. See this blog in how I create 48 'devices' (c0d0t0s0, ...) in a single command line.

$ cat
#! /bin/sh

if [ $# -ne 3 ]; then
        echo "Usage: $0 <prefix> <start#> <end#>"
        exit 1


for i in `seq -w $2 $3`
        touch $prefix-$i.txt

$ time ./ file 1 1000

real    2m36.452s
user    0m4.260s
sys     0m24.288s

$ rm -f file-*

$ cat
#! /bin/sh

if [ $# -ne 1 ]; then
        echo "Usage: $0 <prefix>"
        exit 1

eval touch $prefix-0{$n}{$n}{$n}.txt
mv $prefix-0000.txt $prefix-1000.txt

$ time ./ file

real    0m2.371s
user    0m0.031s
sys     0m0.856s

$ rm -f file-*

$ cat ./
#! /bin/sh

if [ $# -ne 3 ]; then
        echo "Usage: $0 <prefix> <start#> <end#>"
        exit 1

touch `seq -f "$prefix-%04g.txt" $start $end`

$ time ./ file 1 1000

real    0m2.075s
user    0m0.108s
sys     0m0.746s

$ rm -f file-*


$ cat create.tcl
#! /cygdrive/c/Tcl8.4.19/bin/tclsh

if { $argc != 3 } {
        puts stderr "Usage: $argv0 <prefix> <start#> <end#>"
        exit 1

set prefix [lindex $argv 0]
set start  [lindex $argv 1]
set end    [lindex $argv 2]
set pad    [string length $end]
set format "%s-%0${pad}d.txt"

for { set i $start } { $i <= $end } { incr i } {
        set fname [format $format $prefix $i]
        set fp [open $fname w]
        close $fp

$ time ./create.tcl file 1 1000

real    0m7.831s
user    0m0.000s
sys     0m0.031s

$ rm -f file-*

$ cat
#! /usr/bin/python

import sys

if len(sys.argv) != 4:
        sys.stderr.write('Usage: %s <prefix> <start#> <end#>' % (sys.argv[0]))

prefix = sys.argv[1]
start = int(sys.argv[2])
end = int(sys.argv[3])
pad = len(sys.argv[3])
format = '%s-%0' + str(pad) + 'd.txt'
for i in xrange(start, end+1):
        fname = format % (prefix, i)
        fp = open(fname, 'w')

$ time ./ file 1 1000

real    0m2.403s
user    0m0.031s
sys     0m0.951s

$ rm -f file-*

Labels: , ,


Blogger CMoi said...

What about "touch $prefix-0{0..9}{0..9}{0..9}.txt" ?

7:11 PM  
Blogger CMoi said...

Another comment about, you use eval with a parameter, I hope you will never use this in a script run through sudo :) "plop; rm -rf /; echo" would be fun...

7:18 PM  
Blogger Tet said...

leto:~% ./ this_is_a_very_long_filename_designed_to_exceed_the_shell_argument_length_limit 1 2000
./ line 12: /bin/touch: Argument list too long

You can fix this with:

seq -f "$prefix-%04g.txt" $start $end | xargs touch

7:19 PM  
Blogger ST said...

Have you ever tried ksh93? On a Mac Book Pro with a 2.2GHz Core2Duo with file creation on a RAMDisk I measured the following results (this script does not fork):

$ cat create.ksh

(( $# != 3 )) && { print -u2 "usage: $0 prefix start# end#"; exit 1; }

integer start=$2
integer end=$3
integer i

for ((i=start; i<=end; i++)) {
: >> $prefix-$i.txt

$ time ./create.ksh file 1 1000

real 0m0.103s
user 0m0.028s
sys 0m0.071s

$ rm file-*

9:18 PM  
Blogger chihungchan said...

Thanks for all the comments.

CMoi: Yes you are right, the 'eval' will be pretty dangerous with your input arguments. I was trying to be smart :-)

Tet: Thanks for highlighting the 'agrument list too long'. Indeed xargs will be the safest way to go. Normally I only switch to xargs when I encounter error. May be I should stop this habit.

ST: My cygwin does not come with Ksh. After slight modification of your korn shell, the run time on my Core Duo T7250 2GHz is about 2.3s. BTW, cygwin does not have RAMdisk. May be I can try it on /tmp (memory based fs) in Solaris.

11:21 PM  

Post a Comment

<< Home