Saturday, April 14, 2007

mkdir, the limit

A friend of mine hit the limit of having no more than 32765 sub-directories. In Solaris 8, it has been documented: "Too many links cause an attempt was made to create more than the maximum number of hard links (LINK_MAX, by default 32767) to a file. Because each subdirectory is a link to its parent directory, the same error results from trying to create too many subdirectories"

So the question is: Is there anything we can do. I don't think so because this is a built in limit in Solaris. What one can do is to ask ourself why we need that many sub-directories and can we change that flat directory structure to a hierarchical. It is pretty obvious that too many files/directories in folder will have a performance issue when an application interacts with it.

Let do an experiment just to convince ourself that indeed there is a limit:

$ uname -a
SunOS myhost 5.9 Generic_118558-11 sun4u sparc SUNW,Sun-Fire-V240

$ psrinfo -v
Status of processor 0 as of: 04/14/2007 12:16:12
  Processor has been on-line since 10/16/2006 09:10:39.
  The sparcv9 processor operates at 1002 MHz,
        and has a sparcv9 floating point processor.
Status of processor 1 as of: 04/14/2007 12:16:12
  Processor has been on-line since 10/16/2006 09:10:38.
  The sparcv9 processor operates at 1002 MHz,
        and has a sparcv9 floating point processor.

$ mkdir test1

$ cd test1

$ time for i in `perl -e '$,=" ";print 1..32768'`
 do
 mkdir $i
 done
mkdir: Failed to make directory "32766"; Too many links
mkdir: Failed to make directory "32767"; Too many links
mkdir: Failed to make directory "32768"; Too many links

real    2m52.716s
user    0m40.830s
sys     2m7.210s

$ cd ..

$ perl
$dir="test1";
($x,$x,$x,$nlink)=stat($dir);
print $nlink,"\n";
 
32767

As we can see, the maximum number of sub-directories one can create will be 32765. The total links is 32767 because for every directory we create, it creates two links, one for itself (.) and the other one is the parent directory (..)

Let sidetrack a little bit and take a look from the performance angle. It seems to take almost 3 minutes to create 32765 sub-directories. Can it be faster? Let see what interpreted languages like Tcl and Perl can offer.

The perl way:

$ perl -v

This is perl, v5.6.1 built for sun4-solaris-64int
(with 48 registered patches, see perl -V for more detail)

Copyright 1987-2001, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'.  If you have access to the
Internet, point your browser at http://www.perl.com/, the Perl Home Page.

$ mkdir test2

$ cd test2

$ time $ perl
$|=1;
$start=1;
$end=32768;
for ($i=$start; $i<=$end; $i++) {
    mkdir($i);
}
 
real    0m21.098s
user    0m0.180s
sys     0m7.740s
Wow, we are talking about 8 times speed up. How about my favourite Tcl. The Tcl way:
$ cat a.tcl
#! /usr/sfw/bin/tclsh8.3

for { set i 1 } { $i <= 32768 } { incr i } {
        file mkdir $i
}

$ mkdir test3

$ cd test3

$ time ../a.tcl
can't create directory "32766": too many links
    while executing
"file mkdir $i"
    ("for" body line 2)
    invoked from within
"for { set i 1 } { $i <= 32768 } { incr i } {
        file mkdir $i
}
"
    (file "../a.tcl" line 3)

real    0m20.796s
user    0m0.900s
sys     0m8.040s
Wow, I am impressed that Tcl 8.3 (latest is 8.4) is as good as Perl 5.6.1 (latest is 5.8.8)

BTW, to remove that many sub-directories (under test1, test2, test3) 3x32767, it took

$ time /bin/rm -rf test1 test2 test3

real    0m35.425s
user    0m1.210s
sys     0m19.740s
Why perl and Tcl can perform better than the shell script ? Obviously, there is no fork or exec of processes in Perl and Tcl because they have the "mkdir" function call built-in into their interpreter. So how much overhead we are talking about, let's do another experiement.
$ truss mkdir newdir 2>&1 | wc -l
      48

$ truss mkdir u
execve("/usr/bin/mkdir", 0xFFBFFCF4, 0xFFBFFD00)  argc = 2
resolvepath("/usr/lib/ld.so.1", "/usr/lib/ld.so.1", 1023) = 16
resolvepath("/usr/bin/mkdir", "/usr/bin/mkdir", 1023) = 14
stat("/usr/bin/mkdir", 0xFFBFFAC8)              = 0
open("/var/ld/ld.config", O_RDONLY)             Err#2 ENOENT
stat("/usr/lib/libgen.so.1", 0xFFBFF5D0)        = 0
resolvepath("/usr/lib/libgen.so.1", "/usr/lib/libgen.so.1", 1023) = 20
open("/usr/lib/libgen.so.1", O_RDONLY)          = 3
mmap(0x00010000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) = 0xFF3A0000
mmap(0x00010000, 98304, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFF380000
mmap(0xFF380000, 22677, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFF380000
mmap(0xFF396000, 2343, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 24576) = 0xFF396000
munmap(0xFF386000, 65536)                       = 0
memcntl(0xFF380000, 6304, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(3)                                        = 0
stat("/usr/lib/libc.so.1", 0xFFBFF5D0)          = 0
resolvepath("/usr/lib/libc.so.1", "/usr/lib/libc.so.1", 1023) = 18
open("/usr/lib/libc.so.1", O_RDONLY)            = 3
mmap(0xFF3A0000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFF3A0000
mmap(0x00010000, 802816, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFF280000
mmap(0xFF280000, 701788, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFF280000
mmap(0xFF33C000, 24664, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 704512) = 0xFF33C000
munmap(0xFF32C000, 65536)                       = 0
memcntl(0xFF280000, 117372, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(3)                                        = 0
stat("/usr/lib/libdl.so.1", 0xFFBFF5D0)         = 0
resolvepath("/usr/lib/libdl.so.1", "/usr/lib/libdl.so.1", 1023) = 19
open("/usr/lib/libdl.so.1", O_RDONLY)           = 3
mmap(0xFF3A0000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFF3A0000
mmap(0x00002000, 8192, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFF3FA000
mmap(0xFF3FA000, 1894, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFF3FA000
close(3)                                        = 0
mmap(0x00000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFF370000
stat("/usr/platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1", 0xFFBFF2E0) = 0
resolvepath("/usr/platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1", "/usr/platform/sun4u-us3/lib/libc_psr.so.1", 1023) = 41
open("/usr/platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1", O_RDONLY) = 3
mmap(0xFF3A0000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFF3A0000
close(3)                                        = 0
getustack(0xFFBFF914)
getrlimit(RLIMIT_STACK, 0xFFBFF90C)             = 0
getcontext(0xFFBFF748)
setustack(0xFF343A5C)
brk(0x000245E8)                                 = 0
brk(0x000265E8)                                 = 0
umask(0)                                        = 077
umask(077)                                      = 0
mkdir("u", 0777)                                = 0
_exit(0)

Ok, we now know that for every 'mkdir', it calls 48 system functions. So there will be 32,765 x 48 = 1,572,720 system calls in order to create that many directories. In Tcl/Perl, it only takes 6 system calls to create a single directory and that is 8 times less system calls. This tally with our initial speed up calculation.

9364/1:         read(0, " f i l e   m k d i r   t".., 4096)     = 13
9364/1:         stat("t", 0xFFBFE910)                           Err#2 ENOENT
9364/1:         umask(0)                                        = 077
9364/1:         umask(077)                                      = 0
9364/1:         mkdir("t", 0700)                                = 0
9364/1:         write(1, " %  ", 2)                             = 2

Labels: , , , ,

0 Comments:

Post a Comment

<< Home