Shell Script Performance, Part 3
Real cases/scenarios are some time hard to come by and definitely you want to be prepared when you are in such a situation. It is good that you can create such scenario yourself. In my Part 2 regarding shell script performance, I asked my audiences to come up with the fastest way to create 1000 files. of course you do not want to wait for minutes or hours for just 1000 files. What if you want to test on 10,000 or even 100,000 files ?
With that many files in your directory, you can practice your shell scripting by changing the prefix, suffix, extension, padding with zeros, ..., the amount of things you can script is endless.
In Part 1, we understand that process creation is a very expensive task and we should avoid running too many commands, especially within a loop. So I am not going to use the traditional approach like this one:
count=1; while [ $count -le 100 ]; do touch file-$count.txt; count=`expr $count + 1`; done
Below shows you various ways to 'skin a cat':
- create1.sh - use 'seq' to generate 1 to 1000 number to be used in a for loop and each loop create a new file
- create2.sh - use Bash shell built-in capability to do brace expansion
- create3.sh - use 'seq -f' to generate the file name and supply that to touch to create all the files. [Thanks to comment by pjz]
- create.tcl - a Tcl implementation
- create.py - a Python implementation
And the run time:
- create1.sh - 2m36.452s
- create2.sh - 0m2.371s
- create3.sh - 0m2.075s
- create.tcl - 0m7.831s
- create.py - 0m2.403s
$cat create1.sh #! /bin/sh if [ $# -ne 3 ]; then echo "Usage: $0 <prefix> <start#> <end#>" exit 1 fi prefix=$1 start=$2 end=$3 for i in `seq -w $2 $3` do touch $prefix-$i.txt done $time ./create1.sh file 1 1000 real 2m36.452s user 0m4.260s sys 0m24.288s $rm -f file-*
$cat create2.sh #! /bin/sh if [ $# -ne 1 ]; then echo "Usage: $0 <prefix>" exit 1 fi prefix=$1 n="0,1,2,3,4,5,6,7,8,9" eval touch $prefix-0{$n}{$n}{$n}.txt mv $prefix-0000.txt $prefix-1000.txt $time ./create2.sh file real 0m2.371s user 0m0.031s sys 0m0.856s $rm -f file-*
$cat ./create3.sh #! /bin/sh if [ $# -ne 3 ]; then echo "Usage: $0 <prefix> <start#> <end#>" exit 1 fi prefix=$1 start=$2 end=$3 touch `seq -f "$prefix-%04g.txt" $start $end` $time ./create3.sh file 1 1000 real 0m2.075s user 0m0.108s sys 0m0.746s $rm -f file-*
$cat create.tcl #! /cygdrive/c/Tcl8.4.19/bin/tclsh if { $argc != 3 } { puts stderr "Usage: $argv0 <prefix> <start#> <end#>" exit 1 } set prefix [lindex $argv 0] set start [lindex $argv 1] set end [lindex $argv 2] set pad [string length $end] set format "%s-%0${pad}d.txt" for { set i $start } { $i <= $end } { incr i } { set fname [format $format $prefix $i] set fp [open $fname w] close $fp } $ time ./create.tcl file 1 1000 real 0m7.831s user 0m0.000s sys 0m0.031s $rm -f file-*
$cat create.py #! /usr/bin/python import sys if len(sys.argv) != 4: sys.stderr.write('Usage: %s <prefix> <start#> <end#>' % (sys.argv[0])) exit(1) prefix = sys.argv[1] start = int(sys.argv[2]) end = int(sys.argv[3]) pad = len(sys.argv[3]) format = '%s-%0' + str(pad) + 'd.txt' for i in xrange(start, end+1): fname = format % (prefix, i) fp = open(fname, 'w') fp.close() $time ./create.py file 1 1000 real 0m2.403s user 0m0.031s sys 0m0.951s $rm -f file-*
Labels: performance, shell script, Tcl
5 Comments:
What about "touch $prefix-0{0..9}{0..9}{0..9}.txt" ?
Another comment about create2.sh, you use eval with a parameter, I hope you will never use this in a script run through sudo :)
create2.sh "plop; rm -rf /; echo" would be fun...
leto:~% ./create3.sh this_is_a_very_long_filename_designed_to_exceed_the_shell_argument_length_limit 1 2000
./create3.sh: line 12: /bin/touch: Argument list too long
You can fix this with:
seq -f "$prefix-%04g.txt" $start $end | xargs touch
Have you ever tried ksh93? On a Mac Book Pro with a 2.2GHz Core2Duo with file creation on a RAMDisk I measured the following results (this script does not fork):
$ cat create.ksh
#!/bin/ksh
(( $# != 3 )) && { print -u2 "usage: $0 prefix start# end#"; exit 1; }
prefix=$1
integer start=$2
integer end=$3
integer i
for ((i=start; i<=end; i++)) {
: >> $prefix-$i.txt
}
$ time ./create.ksh file 1 1000
real 0m0.103s
user 0m0.028s
sys 0m0.071s
$ rm file-*
Thanks for all the comments.
CMoi: Yes you are right, the 'eval' will be pretty dangerous with your input arguments. I was trying to be smart :-)
Tet: Thanks for highlighting the 'agrument list too long'. Indeed xargs will be the safest way to go. Normally I only switch to xargs when I encounter error. May be I should stop this habit.
ST: My cygwin does not come with Ksh. After slight modification of your korn shell, the run time on my Core Duo T7250 2GHz is about 2.3s. BTW, cygwin does not have RAMdisk. May be I can try it on /tmp (memory based fs) in Solaris.
Post a Comment
<< Home