Sed is Slow for Very Long Stream in Solaris
Is this the natural of the problem that takes sed to run that long or sed is inefficient in certain circumstances?
In this exercise, I created a file with 2000 lines. The first line has 12 characters and all subsequent lines are having an increment of 12 characters with the last line of 24000 characters.
sed 's/\\\\/@/g;s/\\/@/g'
took 35+ minutes on my Sun Fire V440. That's really inefficient. Okay, sed is definitely not the right tool for his job. Let's take a look at the other alternative.
Perl has this "-p
" flag that allow your in-line code to be wrap around a
while (<>) { ... # your script }
loop so that you can write a one-liner. Guess what, Perl took only 5 seconds to finish that substitution. Hey, that's a lot of CPU cycles saved!
Here is the code and the run time info:
$cat run.sh #! /bin/bash comma() { perl -e 'print "c:\\\\a\\\\b\\\\c,"x'${1:-1} echo "" } n=1 while [ $n -le $1 ] do comma $n ((++n)) done $./run.sh 2000 > run2000.txt $wc run2000.txt 2000 2000 24014000 run2000.txt $time sed 's/\\\\/@/g;s/\\/@/g' run2000.txt > run1.txt real 35m6.692s user 35m5.559s sys 0m0.430s $time perl -pe 's/\\\\/@/g;s/\\/@/g' run2000.txt > run2.txt real 0m4.948s user 0m4.491s sys 0m0.145s $digest -a md5 run1.txt run2.txt (run1.txt) = 8820c914e0e038cec9da6f0883b6d964 (run2.txt) = 8820c914e0e038cec9da6f0883b6d964 $uname -a SunOS chihung 5.10 Generic_118822-11 sun4u sparc SUNW,Sun-Fire-V440 $psrinfo -v Status of virtual processor 0 as of: 02/25/2009 00:14:28 on-line since 12/13/2008 00:37:43. The sparcv9 processor operates at 1281 MHz, and has a sparcv9 floating point processor. Status of virtual processor 1 as of: 02/25/2009 00:14:28 on-line since 12/13/2008 00:37:43. The sparcv9 processor operates at 1281 MHz, and has a sparcv9 floating point processor. Status of virtual processor 2 as of: 02/25/2009 00:14:28 on-line since 12/13/2008 00:37:43. The sparcv9 processor operates at 1281 MHz, and has a sparcv9 floating point processor. Status of virtual processor 3 as of: 02/25/2009 00:14:28 on-line since 12/13/2008 00:37:41. The sparcv9 processor operates at 1281 MHz, and has a sparcv9 floating point processor.
Labels: performance, Solaris, unix
2 Comments:
On my laptop running linux, there is no such big difference in the performace of sed and perl regarding your experiment. See
b@pet014204:~$ uname -a
Linux pet014204 2.6.27-11-generic #1 SMP Thu Jan 29 19:24:39 UTC 2009 i686 GNU/Linux
b@pet014204:~$ time ./run.sh 2000 >run2000.txt
real 0m15.492s
user 0m7.552s
sys 0m7.576s
b@pet014204:~$ wc run2000.txt
2000 2000 24014000 run2000.txt
b@pet014204:~$ time sed 's/\\\\/@/g;s/\\/@/g' run2000.txt > run1.txt
real 0m5.293s
user 0m5.120s
sys 0m0.088s
b@pet014204:~$ time perl -pe 's/\\\\/@/g;s/\\/@/g' run2000.txt > run2.txt
real 0m2.224s
user 0m2.156s
sys 0m0.060s
---
Could repeat your test on some different systems?
Thanks for the info. It happened in my Solaris servers. I think Linux has better well-tuned utilities than Solaris.
Post a Comment
<< Home