Saturday, February 27, 2010

An Interview Question

I came across this interview question:
Imagine you are going off to be a sysadmin on a desert island, with no internet access, and further imagine that the previous sysadmin was a total fascist with a minimalist install policy. We're talking a bare-bones "classic" Solaris installation, or a minimal Debian system here. You've got SSH installed, but not much else. Before you hop on the boat, however, you are given a couple of hours high-speed internet access and a USB stick. You can take up to 5 tools with you to this desert island: What do you pick ?

I have been using this question for my recent interviews and sad to say that none of the interviewees managed to impress me.

If I were the interviewee, I would download the following because I have been using these utilities to build tools for my customers. Thanks to my current and ex-employers 'cos they always have no budget to buy commercial tools.

Thursday, February 25, 2010

Finding the Latest Modified Time In A Mount Point

User wanted to decommission a particular mount point but needed to ensure no files/directories was updated recently. So the question is how can we find out exactly the latest modified time in this mount point.

My initial approach was to do a couple of find /mount/point -mtime -... -ls to roughly locate the last modified time. I used the divide and conquer approach, eg, starts with 100 days, if no file return change to 200 days, else 50 days, ... This approach can only give me a rough estimate. If I can list the timestamp in the ISO 8601 format YYYYMMDDTHHMMSS, I can just simply sort it and the latest modified time will be the last record. I know find is not able to do that, but find2perl can convert a find command to Perl code. With Perl, I can modify it to output to the format that I want.

$ /usr/perl5/5.8.4/bin/find2perl /mount/point -ls
#! /usr/perl5/5.8.4/bin/perl -w
    eval 'exec /usr/perl5/5.8.4/bin/perl -S $0 ${1+"$@"}'
        if 0; #$running_under_some_shell

use strict;
use File::Find ();

# Set the variable $File::Find::dont_use_nlink if you're using AFS,
# since AFS cheats.

# for the convenience of &wanted calls, including -eval statements:
use vars qw/*name *dir *prune/;
*name   = *File::Find::name;
*dir    = *File::Find::dir;
*prune  = *File::Find::prune;

sub wanted;
sub ls ();

my @rwx = qw(--- --x -w- -wx r-- r-x rw- rwx);
my @moname = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);

my (%uid, %user);
while (my ($name, $pw, $uid) = getpwent) {
    $user{$uid} = $name unless exists $user{$uid};

my (%gid, %group);
while (my ($name, $pw, $gid) = getgrent) {
    $group{$gid} = $name unless exists $group{$gid};

# Traverse desired filesystems
File::Find::find({wanted => \&wanted}, '/mount/point');

sub wanted {
    my ($dev,$ino,$mode,$nlink,$uid,$gid);

    (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) &&

sub sizemm {
    my $rdev = shift;
    sprintf("%3d, %3d", ($rdev >> 8) & 0xff, $rdev & 0xff);

sub ls () {
    my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,
        $atime,$mtime,$ctime,$blksize,$blocks) = lstat(_);
    my $pname = $name;

        or $blocks = int(($size + 1023) / 1024);

    my $perms = $rwx[$mode & 7];
    $mode >>= 3;
    $perms = $rwx[$mode & 7] . $perms;
    $mode >>= 3;
    $perms = $rwx[$mode & 7] . $perms;
    substr($perms, 2, 1) =~ tr/-x/Ss/ if -u _;
    substr($perms, 5, 1) =~ tr/-x/Ss/ if -g _;
    substr($perms, 8, 1) =~ tr/-x/Tt/ if -k _;
    if    (-f _) { $perms = '-' . $perms; }
    elsif (-d _) { $perms = 'd' . $perms; }
    elsif (-l _) { $perms = 'l' . $perms; $pname .= ' ->>' . readlink($_); }
    elsif (-c _) { $perms = 'c' . $perms; $size = sizemm($rdev); }
    elsif (-b _) { $perms = 'b' . $perms; $size = sizemm($rdev); }
    elsif (-p _) { $perms = 'p' . $perms; }
    elsif (-S _) { $perms = 's' . $perms; }
    else         { $perms = '?' . $perms; }

    my $user = $user{$uid} || $uid;
    my $group = $group{$gid} || $gid;

    my ($sec,$min,$hour,$mday,$mon,$timeyear) = localtime($mtime);
    if (-M _ > 365.25 / 2) {
        $timeyear += 1900;
    } else {
        $timeyear = sprintf("%02d:%02d", $hour, $min);

    printf "%5lu %4ld %-10s %3d %-8s %-8s %8s %s %2d %5s %s\n",

Just change the original printf to this one:

printf("%04d%02d%02dT%02d%02d%02d %d %s\n", $timeyear, $mon+1, $mday, $hour, $min, $sec, $mtime, $pname);

There is a lot of stuff not needed from the find2perl output. We can trim it down to just suit our need, which is to locate the latest modified time. My friend found this on the Internet that do exactly this task. It is very efficient because the script will keep track of the latest time stamp.

use File::Find;
@ARGV = ('.') unless @ARGV;
my ($age, $name);
sub youngest {
    return if defined $age && $age > (stat($_))[9];
    $age = (stat(_))[9];
    $name = $File::Find::name;
find(\&youngest, @ARGV);
print "$name " . scalar(localtime($age)) . "\n";


CPU/Load Utilisation In A Gradient Plot

By taking advantage of RRDtool's CDEF, I am able to superimpose the CPU Load utilisation over the original CPU utilisation graph. The gradient for Load/#CPUs is from 0 to >2.0, with 0 as green (system is completely idle) to 1 as yellow (100% utilised) to >2.0 as red (system is heavily loaded). The gradient interval is at a step of 0.1 and the utilisation graph shows up pretty cool.

It is pretty hard to work with CDEF 'cos you have to program it in reverse polish notation. Once you get one gradient correct, you just have to repeat it for the rest. Here is sample CDEFs showing a few of the gradients:

CDEF:load=cpu_load,UN,0,cpu_load,IF,cpu_total,UN,1,cpu_total,IF,/,1,GT,2,0,IF  \
CDEF:load10a=cpu_load,UN,0,cpu_load,IF,cpu_total,UN,1,cpu_total,IF,/,1.00,GT,cpu,0,IF \
CDEF:load10b=cpu_load,UN,0,cpu_load,IF,cpu_total,UN,1,cpu_total,IF,/,1.10,LT,cpu,0,IF \
CDEF:load10=load10a,0,EQ,0,load10a,0,EQ,0,load10b,IF,IF \
CDEF:load20a=cpu_load,UN,0,cpu_load,IF,cpu_total,UN,1,cpu_total,IF,/,1.10,GT,cpu,0,IF \
CDEF:load20b=cpu_load,UN,0,cpu_load,IF,cpu_total,UN,1,cpu_total,IF,/,1.20,LT,cpu,0,IF \
CDEF:load20=load20a,0,EQ,0,load20a,0,EQ,0,load20b,IF,IF \
CDEF:load30a=cpu_load,UN,0,cpu_load,IF,cpu_total,UN,1,cpu_total,IF,/,1.20,GT,cpu,0,IF \
CDEF:load30b=cpu_load,UN,0,cpu_load,IF,cpu_total,UN,1,cpu_total,IF,/,1.30,LT,cpu,0,IF  \
CDEF:load30=load30a,0,EQ,0,load30a,0,EQ,0,load30b,IF,IF \


Saturday, February 20, 2010

Reflections - The Value of Persistence

Got this from my "the other email"

Colonel Sanders went to more than 1,000 places trying to sell his chicken recipe before he found an interested buyer. The fact that we can buy Kentucky Fried Chicken today attests to his perseverance. Thomas Edison tried almost 10,000 times before he succeeded in creating the electric light. If he had given up, you would be reading this in the dark!

The original business plan for what was to become Federal Express was given a failing grade on Fred Smith's college exam. And, in the early days, their employees would cash their pay checks at retail stores, rather than banks. This meant it would take longer for the money to clear, thereby giving FedEx more time to cover their payroll.

Sylvester Stallone had been turned down a thousand times by agents and was down to his last $600 before he found a company that would produce Rocky. The rest is history! To truly succeed requires a total commitment to your goal. Too many people make the mistake of quitting just short of success. Keep going no matter what. If you really believe in what you are doing, give it all you¹ve got and don¹t give up.

You will succeed. There is no such thing as failure. Every action produces an outcome. It may not always be the outcome you are looking for, but it is an outcome nonetheless. If you monitor the results of your actions and keep correcting what is not working, you will eventually produce the outcome you are looking for.

Be Persistent - Ray Kroc, the late founder of McDonalds, put it best when he said: "Nothing in this world can take the place of persistence. Talent will not; nothing is more common than unsuccessful men with great talent. Genius will not; un-rewarded genius is almost a proverb. Education will not; the world is full of educated derelicts. Persistence, determination and love are omnipotent."


Which Files Grow The Fastest, Take 2

By expanding from my previous blog, I am able to find out what percentage of each of these files contributed to the file system growth. All I have to do is to keep track of the df -k $dir in the two snapshots within the interval.

Here is the script

#! /bin/ksh
# Find out the fastest growth of files within a certain interval

trap 'rm -f $tmpfile-*; exit 0' 0 1 2 3 5 9 15


if [ $# -ne 2 ]; then
        echo "Usage: $0  "
        exit 1
if [ ! -d $dir ]; then
        echo "Error. $dir does not exist"
        exit 2

touch $tmpfile-0

# first snapshot
sleep $interval
used1=`df -kl $dir | awk 'NR==2{print $3}'`
find $dir -type f -newer $tmpfile-0 -mount -ls | awk '{print "t1", $7, $11}' > $tmpfile-1

# second snapshot
sleep $interval
used2=`df -kl $dir | awk 'NR==2{print $3}'`
find $dir -type f -newer $tmpfile-0 -mount -ls | awk '{print "t2", $7, $11}' > $tmpfile-2

cat $tmpfile-1 $tmpfile-2 | nawk -v interval=$interval -v used1=$used1 -v used2=$used2 '
        BEGIN {
                # convert kilobyte to byte
        $1=="t1" { t1[$3]=$2 }
        $1=="t2" { t2[$3]=$2 }
        END {
                for (i in t1) {
                        printf("%d Bps (%.2lf %%) %s\n", d/interval, percent, i)
        }' | sort -n -k 1

And the corresponding output:

474 Bps (0.70 %) /opt/app/App/logs/current/poll20100219.out
504 Bps (0.74 %) /opt/app/App/logs/current/app.log20100219.log
818 Bps (1.20 %) /opt/app/App/logs/current/notify20100219.log
2165 Bps (3.18 %) /opt/app/bea/user_projects/domains/appDomain/servers/recorder.log
39638 Bps (58.28 %) /opt/app/App/logs/current/xml.log

Now we can clearly identified /opt/app/App/logs/current/xml.log contributed 58% of the file system growth.

Labels: ,

Thursday, February 18, 2010

Which Files Grow The Fastest

One of the file system mount points is growing at the rate of 100 KB per second, that is 8.3 GB per day. With this growth rate, the mount point will eventually hit 100% and the application will likely crash. There are couple of options:
  1. Increase the size of the mount point, but this does not solve any problem, just deferring the issue
  2. Develop house-keeping script to periodically clean up some of the old logs, but this still not solving the problem
  3. Find out which files grow the fastest and work with the owner to resolve the issue.

Guess what, I choose the latter 'cos it is technically challenging. It does not seem to be straightforward to begin with because we have to deal with a lot of files in the file system. By taking advantage of the "-newer" flag in the find command, I can touch a file and locate any modified files newer than this within a certain interval. With two snapshots of find, I can work out the growth rate in term of Bps (bytes per second).

Here is the code:

#! /bin/ksh
# Find out the growth of files within a certain interval

trap 'rm -f $tmpfile; exit 0' 0 1 2 3 5 9 15


if [ $# -ne 2 ]; then
        echo "Usage: $0  "
        exit 1
if [ ! -d $dir ]; then
        echo "Error. $dir does not exist"
        exit 2

touch $tmpfile

sleep $interval
    find $dir -type f -newer $tmpfile -mount -ls | awk '{print "t1", $7, $11}'
    sleep $interval
    find $dir -type f -newer $tmpfile -mount -ls | awk '{print "t2", $7, $11}'
) | nawk -v interval=$interval '
$1=="t1" { t1[$3]=$2 }
$1=="t2" { t2[$3]=$2 }
        for (i in t1) {
                printf("%d Bps (Before:%d After:%d) %s\n", d/interval, t1
[i], t2[i], i)
}' | sort -n -k 1

Script in action:

# ./check-growth.ksh /opt/app 60
3 Bps (Before:4645717 After:4645924) /opt/app/domains/AppDomain/AppDomain.log
13 Bps (Before:14040 After:14820) /opt/app/App/logs/message20100218.log
71 Bps (Before:880056 After:884334) /opt/app/domains/AppDomain/record.log
415 Bps (Before:1282108337 After:1282133250) /opt/app/domains/AppDomain/audit.log
474 Bps (Before:514492 After:542938) /opt/app/App/logs/task.out
13675 Bps (Before:1203200386 After:1204020898) /opt/app/App/runs/nohup.out
43956 Bps (Before:47963888 After:50601297) /opt/app/App/logs/access20100218.log

Now we found out which files have contributed the most within a 1 minute interval. With these information, we can work with the application team to resolve the issue.

Labels: ,

Saturday, February 13, 2010

Is High CPU Utilisation A Bad Sign ?

Is high CPU utilisation a bad sign ? Can a single CPU utilisation graph tell you the whole story ? In order to tell a real story, one need to piece other performance data together.

Here is a typical monthly graph of the total CPU utilisation and memory utilisation put together in the same scale (percentage). You can see that this server has sufficient memory. As for the CPU, it hit 90-100% at the early hour everyday. As we all know, CPU utilisation can only max at 100% and we have no idea whether it exceeded the 100% mark.

It is the load averge that can tell you the amount of work that your server performs. With RRDtool CDEF, you can flag out if the cpu load exceeds the no. of CPUs. The CDEF definition is in reverse polish notation, which may take a while to get used to it. Here is the same graph with additional information (indicated as red) to show if load is higher than the number of CPUs.

It is definitely possible to have various colour codes to indicate different level of load, eg. 1.0 < load/#CPUs < 1.5, 1.5 <= load/#CPUs < 2.0, ... They can all be done in CDEF.

Labels: ,

Thursday, February 04, 2010

My Solaris Cannot Fork

Recently one of my Solaris servers cannot fork due to "Resource temporarily unavailable". It was flagged out in the system message via syslog

Although I was still active in my ssh login session, I was not able to run any command (like ls, w, cat...) 'cos they need to fork out processes from the shell. In this kind of scenario, you are basically handicapped. After some searching on the web, I found this link that showed you how to simulate other commands using shell built-in functions. With this, there is no forking involved. Basically it take advantage of loop construct and file redirection.

I am trying to show you how you can do a few things with built-in functions that may help you to save the day. I believe it can do more than what I described here. BTW, they run on Korn shell.

Equivalent of "cat":

        while read line
                echo $line
        done < $1

Equivalent of "wc":

        while read line
                set -- $line
        done < $in
        echo "\t$nl\t$nw\t$nc\t$in"

Equivalent of "ls"

        echo $@ | while read i
                echo $i

In such situation, you may want to count the total number of files opened in the system (based on /proc/*/fd/*, exclude stdin/stdout/stderr):

        for fd in /proc/[0-9]*/fd/*
                # exclude stdin/stdout/stderr (0/1/2)
                if [ ${fd##*/} -gt 2 ]; then
        echo $n

Labels: , ,