Saturday, July 25, 2009

Do What You Think Is Right For The User

I have been collecting disk utilisation for every server (df -k -F ufs; df -k -F vxfs) every day since Feb this year and have them append to a file. The output has been carefully prepended with timestamp and hostname like this
2009-07-14 host1 /dev/dsk/c3t0d0s0    11098457 8391180 2596293   77%    /
2009-07-14 host1 /dev/dsk/c3t0d0s5    6050982 5175437  815036    87%    /var
2009-07-14 host1 /dev/dsk/c3t1d0s2    70573141 10437628 59429782 15%    /d2
2009-07-14 host1 /dev/dsk/c3t0d0s3    51294057 43390875 7390242  86%    /d1

I have no idea of what I am going to do with the data and the file just keep growing. My gut feel tells me that it will be of some use one of these days. Now the file is 21MB in size, 248,059 lines and we are talking about appending 1500+ lines of df output per day.

# ls -lh mon-df.txt
-rw-r--r--   1 root     root     21M  Jul 25 09:00 mon-df.txt

# wc -l mon-df.txt
  248059 mon-df.txt

# tail -1 mon-df.txt
2009-07-25 host1 /dev/dsk/c4t0d0s0  2097152   34406 1933831     2%    /some/dir

# grep -c 2009-07-25 mon-df.txt
1530

Recently when I was talking to the user, I proposed a self-help web page for them to monitor their own partition size and they just loved the idea. Initially the web page was just a text-based summary of those partitions that are above the threshold. After some thoughts, I think I can visualise all these data that I have been collecting using RRDtool. This will give them a historical trending on the growth of the partition.

My colleague advised me, based on his/her experience, not to do extra stuff for user because people will eventually ask for more. If I were to take his/her advice, all the data that I collected will be useless. I am glad that I did not take up the advice. IMO, just do what you think is right for the user and everything will fall into places.

In this exercise, I created a CGI program to 'massage' the data before feeding them to RRDtool for graphing. It takes about 1.8 seconds, which is considered a bit slow. In my CGI script, I tried to do everything from AWK (trying to be smart). Although AWK is very powerful, it is not fast enough when it comes to manipulate so much data. Also, AWK will not be able to take advantage of multiple CPUs. By doing a fast grep (fgrep), which is damn fast for fix string search, and pipe the output to AWK, AWK does not have to handle that many lines and CGI response time has been cut down to less than 0.8 second. When fgrep ... | nawk ... combine in a pipe, it will spawn off two processes to run together and will utilise 2 CPUs/Cores if available. Kind of parallel shell script programming :-)

# time nawk -v host=$host -v fs=$fs -v rrdfile=$rrdfile '
$2==host && $8==fs {
        total=$4/(1024*1024)
        used=$5/(1024*1024)
        gsub("-","",$1)
        printf("update %s %s %.2lf %.2lf\n", rrdfile, $1, total, used)
}' mon-df.txt > /dev/null

real    0m1.764s
user    0m1.668s
sys     0m0.056s

# time fgrep $host mon-df.txt | time nawk -v host=$host -v fs=$fs -v rrdfile=$rrdfile '
$2==host && $8==fs {
        total=$4/(1024*1024)
        used=$5/(1024*1024)
        gsub("-","",$1)
        printf("update %s %s %.2lf %.2lf\n", rrdfile, $1, total, used)
}' > /dev/null

real        0.7
user        0.0
sys         0.0

real    0m0.752s
user    0m0.718s
sys     0m0.056s

# psrinfo | wc -l
       2

Lesson learned.

  1. Do what you think is right for the user
  2. Use the fastest command upfront to filter the data set before processing can give you performance advantage for large data set.
  3. Having more than one commands (cmd1 | cmd2 | cmd3 ...) to work on the data can reduce run time in a multi-core/cpu system.

Labels: , , ,

0 Comments:

Post a Comment

<< Home