Thursday, February 18, 2010

Which Files Grow The Fastest

One of the file system mount points is growing at the rate of 100 KB per second, that is 8.3 GB per day. With this growth rate, the mount point will eventually hit 100% and the application will likely crash. There are couple of options:
  1. Increase the size of the mount point, but this does not solve any problem, just deferring the issue
  2. Develop house-keeping script to periodically clean up some of the old logs, but this still not solving the problem
  3. Find out which files grow the fastest and work with the owner to resolve the issue.

Guess what, I choose the latter 'cos it is technically challenging. It does not seem to be straightforward to begin with because we have to deal with a lot of files in the file system. By taking advantage of the "-newer" flag in the find command, I can touch a file and locate any modified files newer than this within a certain interval. With two snapshots of find, I can work out the growth rate in term of Bps (bytes per second).

Here is the code:

#! /bin/ksh
#
# Find out the growth of files within a certain interval


trap 'rm -f $tmpfile; exit 0' 0 1 2 3 5 9 15


PATH=/usr/bin:/bin:/usr/sbin
LD_LIBRARY_PATH=/usr/lib:/lib


if [ $# -ne 2 ]; then
        echo "Usage: $0  "
        exit 1
fi
dir=$1
interval=$2
if [ ! -d $dir ]; then
        echo "Error. $dir does not exist"
        exit 2
fi


tmpfile="/tmp/${0##*/}.$$"
touch $tmpfile


sleep $interval
(
    find $dir -type f -newer $tmpfile -mount -ls | awk '{print "t1", $7, $11}'
    sleep $interval
    find $dir -type f -newer $tmpfile -mount -ls | awk '{print "t2", $7, $11}'
) | nawk -v interval=$interval '
$1=="t1" { t1[$3]=$2 }
$1=="t2" { t2[$3]=$2 }
END {
        for (i in t1) {
                d=t2[i]-t1[i]
                printf("%d Bps (Before:%d After:%d) %s\n", d/interval, t1
[i], t2[i], i)
        }
}' | sort -n -k 1

Script in action:

# ./check-growth.ksh /opt/app 60
3 Bps (Before:4645717 After:4645924) /opt/app/domains/AppDomain/AppDomain.log
13 Bps (Before:14040 After:14820) /opt/app/App/logs/message20100218.log
71 Bps (Before:880056 After:884334) /opt/app/domains/AppDomain/record.log
415 Bps (Before:1282108337 After:1282133250) /opt/app/domains/AppDomain/audit.log
474 Bps (Before:514492 After:542938) /opt/app/App/logs/task.out
13675 Bps (Before:1203200386 After:1204020898) /opt/app/App/runs/nohup.out
43956 Bps (Before:47963888 After:50601297) /opt/app/App/logs/access20100218.log

Now we found out which files have contributed the most within a 1 minute interval. With these information, we can work with the application team to resolve the issue.

Labels: ,

0 Comments:

Post a Comment

<< Home