Monday, October 15, 2007

find find

In a recently performance analysis, I need to find out how many files in each of the directories/sub-directories to determine whether there is any heavily populated files in any of the directory.

What I need is to run find within find. Also I need to make sure the find will not traverse into any of the sub-directory. In Linux, you can specific "-maxdepth 1" to limit your depth.

First I need to write a script to find the number of files (no directory) in a given directory.

$ cat /tmp/find_f.sh
#! /bin/sh

n=`find $1 -maxdepth 1 -type f | awk 'END{print NR}'`
echo $1 - $n

Once that is done, I will have to run the above script for every directories/sub-directories. To find out the most populated directory, I simply pipe that to "sort" with -n for numeric sort.

$ find /usr/include -type d -exec /tmp/find_f.sh {} \; | sort -n -k 3 | tail
/usr/include/c++/4.1.1/java/awt - 104
/usr/include/evolution-data-server-1.8/camel - 112
/usr/include/c++/4.1.1/javax/swing - 124
/usr/include/c++/4.1.1/gnu/java/locale - 141
/usr/include/kde/dom - 179
/usr/include/boost/mpl - 182
/usr/include/gtk-2.0/gtk - 195
/usr/include - 239
/usr/include/linux - 331
/usr/include/kde - 456

Labels:

1 Comments:

Blogger chihungchan said...

It can be more efficient without executing another script:

find /usr/include -type f | awk -F"/" '{r=sprintf("/%s$",$NF)
sub(r,$1);++d[$0]}END{for(i in d){print i,d[i]}}' | sort -n -k 2

1:12 PM  

Post a Comment

<< Home