Friday, July 16, 2010

One Million Files in A Directory Under Veritas File System

Recently my backup ran really really slow with a throughput of 50KBps. With such a low throughput, I was not able to finish it within the backup window. After some investigations, I realised that there is a directory comprises of close to 1 million files. It literally takes minutes to do a "ls -l".

Although file system allows you have that many files, it will be extremely inefficient for other downstream activities.

It will be a challenge to locate the problematic directory if you were to use shell script. Other scripting languages such as Perl will be more appropriate. With find2perl utility that comes with standard perl installation, you can get it to generate equilvant find command in perl

$ find2perl /usr/include -type d
#! /usr/bin/perl -w
    eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
        if 0; #$running_under_some_shell

use strict;
use File::Find ();

# Set the variable $File::Find::dont_use_nlink if you're using AFS,
# since AFS cheats.

# for the convenience of &wanted calls, including -eval statements:
use vars qw/*name *dir *prune/;
*name   = *File::Find::name;
*dir    = *File::Find::dir;
*prune  = *File::Find::prune;

sub wanted;



# Traverse desired filesystems
File::Find::find({wanted => \&wanted}, '/usr/include');
exit;


sub wanted {
    my ($dev,$ino,$mode,$nlink,$uid,$gid);

    (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) &&
    -d _
    && print("$name\n");
}
With this skeleton code, I am able to modify it to help to locate sub-directory with the most items
#! /usr/bin/perl


use strict;
use File::Find ();
use Cwd 'abs_path';


# Set the variable $File::Find::dont_use_nlink if you're using AFS,
# since AFS cheats.

# for the convenience of &wanted calls, including -eval statements:
use vars qw/*name *dir *prune/;
*name   = *File::Find::name;
*dir    = *File::Find::dir;
*prune  = *File::Find::prune;


my $max = 0;
my $maxpath;


sub wanted;


my $searchdir;
if ( $#ARGV == -1 ) {
        $searchdir=".";
} else {
        $searchdir=$ARGV[0];
}



# Traverse desired filesystems
File::Find::find({wanted => \&wanted}, $searchdir);
print $maxpath, ' ', $max, "\n";
exit;


sub wanted {
        my ($dev,$ino,$mode,$nlink,$uid,$gid);

        my $file;
        my $count=0;
        if ( (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) && -d _ )
{
                opendir(DIR, $_);
                while (defined($file=readdir(DIR))) {
                        ++$count;
                }
                closedir(DIR);
                if ( $count > $max ) {
                        $max = $count;
                        $maxpath = abs_path($_);
                }
        }
}
If I run on my Ubuntu 10.04 Netbook Edition /usr/share directory, it will tell me /usr/share/foomatic/db/source/printer has 3258 files (include . and ..)
$ ./files-in-directory-max.pl /usr/share
/usr/share/foomatic/db/source/printer 3258

So, what will be the maximum number of files in a directory under vxfs ? I found this using Google:
Recommended maximum number of files in a single Veritas File System (VxFS) directory is 100,000

Labels:

0 Comments:

Post a Comment

<< Home