Tuesday, August 21, 2012

Remove Hundreds of Thousands of Files

If you need to remove tonnes of files in a directory, you will likely hit the "argument list too long" error when you try to "rm -f *.log.*". This is due to your shell trying to expand the wild card to actual filenames and it exceeded the ARG_MAX. In Linux, run "getconf ARG_MAX" to find out the limit. My Ubuntu showed 2097152 as my ARG_MAX.

If you are in this situation, you are better off using a high level scripting language such as Python. With Python, you do not have to 'exec' 'rm' for every file.

Here is my script to do this task efficiently:

#! /usr/bin/python

import os,sys,glob

if nargv==2:
    pattern='*%s*' % sys.argv[1]
elif nargv==3:
    pattern='*%s*' % sys.argv[1]
    print "Usage: %s pattern [directory]" % sys.argv[0]
    print "       eg. %s .log - to remove *.log* in current directory"
    print "       eg. %s 201207 /var/log/app - to remove *201207* in /var/log/app directory"
    print ""

for f in glob.glob(pattern):