Tuesday, August 21, 2012

Remove Hundreds of Thousands of Files

If you need to remove tonnes of files in a directory, you will likely hit the "argument list too long" error when you try to "rm -f *.log.*". This is due to your shell trying to expand the wild card to actual filenames and it exceeded the ARG_MAX. In Linux, run "getconf ARG_MAX" to find out the limit. My Ubuntu showed 2097152 as my ARG_MAX.

If you are in this situation, you are better off using a high level scripting language such as Python. With Python, you do not have to 'exec' 'rm' for every file.

Here is my script to do this task efficiently:

#! /usr/bin/python


import os,sys,glob


nargv=len(sys.argv)
if nargv==2:
    pattern='*%s*' % sys.argv[1]
    basedir=os.getcwd()
elif nargv==3:
    pattern='*%s*' % sys.argv[1]
    basedir=sys.argv[2]
else:
    print "Usage: %s pattern [directory]" % sys.argv[0]
    print "       eg. %s .log - to remove *.log* in current directory"
    print "       eg. %s 201207 /var/log/app - to remove *201207* in /var/log/app directory"
    print ""
    exit(1)

os.chdir(basedir)
for f in glob.glob(pattern):
    os.remove(f)