Remove Hundreds of Thousands of Files, take 2
Normally the boilerplate for removing files in 'find' is find some-dir -name "*pattern*" -exec rm -f {} \;
. This is very inefficent because it has to fork as many process as the number of files. As we all know, forking takes time to create process. If fork takes 0.01s to create a process, it will take 1,000s (16+ min) just to create those 'rm' processes for 100,000 files to be removed.
Below is the summary of strace system calls for the 3 solutions (python way, traditional find way with -exec, and find -delete) to delete 17576 files (26*26*26). Definitely 'find -delete' is the winner. See for yourself.
- Python way - 18647 system calls, 0.0896s run time
- find -exec rm - 843786 system calls, 42.801s run time
- find -delete - 17711 system calls, 0.0793s run time
Python way:
$ touch somefiles-{a..z}{a..z}{a..z} $ strace -cf ./rm.py somefiles % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 81.86 0.073372 4 17576 unlink 17.78 0.015938 590 27 getdents64 0.09 0.000080 1 89 close 0.08 0.000071 0 153 read 0.08 0.000070 1 135 74 stat64 0.07 0.000059 0 268 182 open 0.04 0.000036 0 137 fstat64 0.00 0.000000 0 1 execve 0.00 0.000000 0 1 chdir 0.00 0.000000 0 9 9 access 0.00 0.000000 0 12 brk 0.00 0.000000 0 5 1 ioctl 0.00 0.000000 0 4 2 readlink 0.00 0.000000 0 50 munmap 0.00 0.000000 0 1 uname 0.00 0.000000 0 10 mprotect 0.00 0.000000 0 3 _llseek 0.00 0.000000 0 68 rt_sigaction 0.00 0.000000 0 1 rt_sigprocmask 0.00 0.000000 0 2 getcwd 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 74 mmap2 0.00 0.000000 0 9 lstat64 0.00 0.000000 0 1 getuid32 0.00 0.000000 0 1 getgid32 0.00 0.000000 0 1 geteuid32 0.00 0.000000 0 1 getegid32 0.00 0.000000 0 1 1 futex 0.00 0.000000 0 1 set_thread_area 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 3 openat 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.089626 18647 269 total
Traditional find -exec rm -f {} \;:
$ touch somefiles-{a..z}{a..z}{a..z} $ strace -cf find . -name "somefiles-*" -exec rm -f {} \; % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 97.91 42.883595 2440 17576 waitpid 1.30 0.571413 33 17576 clone 0.55 0.241115 14 17576 unlinkat 0.07 0.030407 0 105467 close 0.04 0.017349 1 17577 fstatat64 0.03 0.014306 1 17576 17576 _llseek 0.03 0.012770 0 52737 open 0.02 0.008407 0 140626 mmap2 0.02 0.006971 0 17577 ioctl 0.01 0.004180 182 23 getdents64 0.01 0.004000 0 123033 105456 execve 0.01 0.003189 0 52757 brk 0.00 0.001418 0 17577 munmap 0.00 0.001373 0 52735 fstat64 0.00 0.000519 0 70311 mprotect 0.00 0.000000 0 17580 read 0.00 0.000000 0 52734 52734 access 0.00 0.000000 0 1 gettimeofday 0.00 0.000000 0 2 uname 0.00 0.000000 0 17581 fchdir 0.00 0.000000 0 3 rt_sigaction 0.00 0.000000 0 1 rt_sigprocmask 0.00 0.000000 0 2 getrlimit 0.00 0.000000 0 2 1 futex 0.00 0.000000 0 17577 set_thread_area 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 1 openat 0.00 0.000000 0 17577 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 43.801012 843786 175767 total
find -delete way:
$ touch somefiles-{a..z}{a..z}{a..z} $ strace -cf find . -name "somefiles-*" -delete % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 87.20 0.069193 4 17576 unlinkat 12.69 0.010070 438 23 getdents64 0.10 0.000083 5 17 mmap2 0.00 0.000000 0 4 read 0.00 0.000000 0 9 open 0.00 0.000000 0 11 close 0.00 0.000000 0 1 execve 0.00 0.000000 0 6 6 access 0.00 0.000000 0 29 brk 0.00 0.000000 0 1 ioctl 0.00 0.000000 0 1 gettimeofday 0.00 0.000000 0 1 munmap 0.00 0.000000 0 2 uname 0.00 0.000000 0 7 mprotect 0.00 0.000000 0 5 fchdir 0.00 0.000000 0 2 rt_sigaction 0.00 0.000000 0 1 rt_sigprocmask 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 7 fstat64 0.00 0.000000 0 2 1 futex 0.00 0.000000 0 1 set_thread_area 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 1 openat 0.00 0.000000 0 1 fstatat64 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.079346 17711 7 total
0 Comments:
Post a Comment
<< Home