Saturday, August 16, 2008

Recursive SCP, An Efficient Way

My colleagues were trying to "SCP" (secure copy) from one RHEL4 to another and realised that it is going to take a while to copy so many files over. At the end they decided to "tar cvfzp" the directory and scp the gzip tar ball over and unpack it from the other server.

At the back of my mind I was wondering whether we can make use of UNIX pipe to achieve all this. Not only I do not have to create a temporary tarball, also it should be pretty efficient to take advantage of the stream of data flowing over TCP/IP to keep the window size to the maximum.

Below experiment was carried from my Cygwin tmp directory (525K bytes in total, with 59 files and 5 sub-directories) to be copied over to my office CentOS5 box via broadband connection.

$ time scp -r tmp chihung@$MY_CENTOS5:. > /dev/null

real    0m16.828s
user    0m0.138s
sys     0m0.139s

$ tar cfzp - ./tmp | time ssh chihung@$MY_CENTOS5 tar xfzp -
0.07user 0.06system 0:04.12elapsed 3%CPU (0avgtext+0avgdata 413440maxresident)k
0inputs+0outputs (1635major+0minor)pagefaults 0swaps

You can see that we are talking about 16.828 seconds vs 4.12seconds. Also, the 'tar' way can compress the stream and preserve the file permissions. The above ssh connection has been setup to be password-less to avoid additional time required to login.

Not all UNIX systems comes with GNU tar that can do gzip (-z) and Solaris is one of them. We can do the same trick to combine gzip and gunzip at both end to achieve the same effect as GNU tar. Also, to ensure the data transferred over is intact, you can do a md5sum on the tar stream.

$ tar cfp - ./tmp | gzip | time ssh chihung@$MY_CENTOS5 "gunzip | tar xfp -"

0.09user 0.04system 0:04.09elapsed 3%CPU (0avgtext+0avgdata 413184maxresident)k
0inputs+0outputs (1634major+0minor)pagefaults 0swaps

$ tar cf - ./tmp | md5sum
d46a7b5985d0ea408186222c0257405f  -

0 Comments:

Post a Comment

<< Home