Saturday, July 25, 2009

Do What You Think Is Right For The User

I have been collecting disk utilisation for every server (df -k -F ufs; df -k -F vxfs) every day since Feb this year and have them append to a file. The output has been carefully prepended with timestamp and hostname like this
2009-07-14 host1 /dev/dsk/c3t0d0s0    11098457 8391180 2596293   77%    /
2009-07-14 host1 /dev/dsk/c3t0d0s5    6050982 5175437  815036    87%    /var
2009-07-14 host1 /dev/dsk/c3t1d0s2    70573141 10437628 59429782 15%    /d2
2009-07-14 host1 /dev/dsk/c3t0d0s3    51294057 43390875 7390242  86%    /d1

I have no idea of what I am going to do with the data and the file just keep growing. My gut feel tells me that it will be of some use one of these days. Now the file is 21MB in size, 248,059 lines and we are talking about appending 1500+ lines of df output per day.

# ls -lh mon-df.txt
-rw-r--r--   1 root     root     21M  Jul 25 09:00 mon-df.txt

# wc -l mon-df.txt
  248059 mon-df.txt

# tail -1 mon-df.txt
2009-07-25 host1 /dev/dsk/c4t0d0s0  2097152   34406 1933831     2%    /some/dir

# grep -c 2009-07-25 mon-df.txt
1530

Recently when I was talking to the user, I proposed a self-help web page for them to monitor their own partition size and they just loved the idea. Initially the web page was just a text-based summary of those partitions that are above the threshold. After some thoughts, I think I can visualise all these data that I have been collecting using RRDtool. This will give them a historical trending on the growth of the partition.

My colleague advised me, based on his/her experience, not to do extra stuff for user because people will eventually ask for more. If I were to take his/her advice, all the data that I collected will be useless. I am glad that I did not take up the advice. IMO, just do what you think is right for the user and everything will fall into places.

In this exercise, I created a CGI program to 'massage' the data before feeding them to RRDtool for graphing. It takes about 1.8 seconds, which is considered a bit slow. In my CGI script, I tried to do everything from AWK (trying to be smart). Although AWK is very powerful, it is not fast enough when it comes to manipulate so much data. Also, AWK will not be able to take advantage of multiple CPUs. By doing a fast grep (fgrep), which is damn fast for fix string search, and pipe the output to AWK, AWK does not have to handle that many lines and CGI response time has been cut down to less than 0.8 second. When fgrep ... | nawk ... combine in a pipe, it will spawn off two processes to run together and will utilise 2 CPUs/Cores if available. Kind of parallel shell script programming :-)

# time nawk -v host=$host -v fs=$fs -v rrdfile=$rrdfile '
$2==host && $8==fs {
        total=$4/(1024*1024)
        used=$5/(1024*1024)
        gsub("-","",$1)
        printf("update %s %s %.2lf %.2lf\n", rrdfile, $1, total, used)
}' mon-df.txt > /dev/null

real    0m1.764s
user    0m1.668s
sys     0m0.056s

# time fgrep $host mon-df.txt | time nawk -v host=$host -v fs=$fs -v rrdfile=$rrdfile '
$2==host && $8==fs {
        total=$4/(1024*1024)
        used=$5/(1024*1024)
        gsub("-","",$1)
        printf("update %s %s %.2lf %.2lf\n", rrdfile, $1, total, used)
}' > /dev/null

real        0.7
user        0.0
sys         0.0

real    0m0.752s
user    0m0.718s
sys     0m0.056s

# psrinfo | wc -l
       2

Lesson learned.

  1. Do what you think is right for the user
  2. Use the fastest command upfront to filter the data set before processing can give you performance advantage for large data set.
  3. Having more than one commands (cmd1 | cmd2 | cmd3 ...) to work on the data can reduce run time in a multi-core/cpu system.

Labels: , , ,

Introduction to Parallel Program: Module 2: Multicore Processor Architectures

As I blogged about Module 1: Performance Tuning Video early this month, I promised to keep you posted when Module 2: Multicore Processor Architectures is available. As the author mentioned, it will be a 7-module training for Introduction to Parallel Programming. So stay tune for more

Labels: , ,

Tuesday, July 21, 2009

Multicore Programming Lecture Series Videos

CilkArts has posted videos from their recent series of lectures detailing the various methodologies behind multicore programming. They’ve even posted the slide decks!
  1. Lecture 1
    • The multicore programming challenge
    • Shared-memory hardware
    • Leading concurrency platforms (Pthreads, OpenMP, TBB, Cilk++)
    • Race conditions
  2. Lecture 2
    • What is Parallelism?
    • Scheduling Theory
    • Cilk++ Runtime System
    • A Chess Lesson
  3. Lecture 3
    • Implementation of Cilk loops
    • Divide-&-Conquer Recurrences
    • Matrix Multiplication
    • Tableau Construction

Hope you like them all.

Auto Logout Those Idle Users

The CentOS OS hardening has this tip to auto-logout those idle users. Setup a readonly TMOUT variable (works in ksh and bash) in /etc/profile so that the user cannot change it. What a noble way of using readonly variable.

Labels:

Computer Pioneers - Pioneer Computers Pt 2

Further to my posting for Computer Pioneers - Pioneer Computers Part 1, here is the part 2

Labels:

Saturday, July 18, 2009

New Trick Learned

Learned a new trick from the Shell Programming and Scripting from Unix and Linux Forum. Suppose you need to convert a single column file to multi-column with colon(:) as separator
$ cat a
one
two
three
four
five
six
seven
eight
nine
ten
eleven
twelve

$ cat a | paste -d: - - -
one:two:three
four:five:six
seven:eight:nine
ten:eleven:twelve

$ cat a | paste -d: - - - -
one:two:three:four
five:six:seven:eight
nine:ten:eleven:twelve

$ cat a | paste -d: - - - - - -
one:two:three:four:five:six
seven:eight:nine:ten:eleven:twelve

In Unix, if "-" is specify as the file input, it will take the input from the pipe. In our case, multiple "-" in the paste will consume input one line at a time. Just a recap, I blogged about paste command in Solaris has a limit of no more than 12 files

$ (cat a a a) | paste -d: - - - - - - - - - -
one:two:three:four:five:six:seven:eight:nine:ten
eleven:twelve:one:two:three:four:five:six:seven:eight
nine:ten:eleven:twelve:one:two:three:four:five:six
seven:eight:nine:ten:eleven:twelve::::

$ (cat a a a) | paste -d: - - - - - - - - - - - -
one:two:three:four:five:six:seven:eight:nine:ten:eleven:twelve
one:two:three:four:five:six:seven:eight:nine:ten:eleven:twelve
one:two:three:four:five:six:seven:eight:nine:ten:eleven:twelve

$ (cat a a a) | paste -d: - - - - - - - - - - - - - -
paste: too many files- limit 12

Labels: ,

How to list only today's files

Sometimes you may have difficulties in finding those files created/modified today using "find -mtime" command. What we can do is to take advantage of the "-nt" (file newer) conditional expression. By creating a file at exactly today at 00:00, we can use that to compare.
today="/tmp/today.$$"
touch -t `date '+%Y%m%d0000'` $today
for i in `find . -type f`
do
    if [ $i -nt $today ]; then
        echo $i
    fi
done
rm -f $today

BTW, "-nt" available only in Korn and Bash shell.

Labels: ,

Wednesday, July 08, 2009

RRDtool Tips and Tricks

I am having time-off today and found time to think about how I can determine the top 5 network throughput out of hundreds of RRD files. I stumbled upon this article: RRDtool Tips & Tricks. It is indeed a very useful paper and some of the tips mentioned helped me to summarise the data without having to do any programming. Look under the VDEF section.

The best tips are on page 2 and page 3:

Recipe for Success - Resolve the problems before anyone else finds them

... and talk about it - Being able is only half the story, the others must know too!

Labels:

Saturday, July 04, 2009

Computer Pioneers - Pioneer Computers Part 1

Computer Pioneers - Pioneer Computers Part 1 video from Computer History Museum. This is a 53:26 video. Interested to find out the first computer bug, fast forward to 36:00 to Grace Hopper's talk.

Labels:

Performance Tuning Video

Sun Mircosystem's High Performance Computing portal has just launched it's first module of training material in video.

The first module is Introduction to Parallel Programming. This 25 mins video provides a lot of tips to tackle performance tuning issues. If you often like to throw hardware at performance problems, you should watch this video with an open mind.

As the presenter mentioned, the next module will cover multi-core architecture. I will post the link once it is available.

Labels: , ,

Thursday, July 02, 2009

How to find cron queue size

If your system replies heavy on cron (1M) to schedule jobs, you may be interested to know how many cron jobs are running. In Solaris, the default queue size for crontab(1) is 100, you can verify that with the source code.

When cron exceeds the limit, the log (/var/cron/log) will report this and it will not be able to schedule any jobs until the queue size is lower than the limit

! c queue max run limit reached Thu Jul  2 21:22:00 2009
! rescheduling a cron job Thu Jul  2 21:22:00 2009

To find out how many child processes running under the cron, we need to know the pid of cron. If your system runs Solaris container (zone), there will be more than one cron processes if you do ps -ef | grep cron. In order to exactly determine the pid of your cron, you need to specify whatever zone you are in to grep the process using pgrep(1). Once we have cron's pid, we can do a ps listing to find out all the child processes with such a parent pid.

# pgrep -x -z `zonename` cron
5847

# ptree 5847
5847  /usr/sbin/cron
  13293 sh -c sleep 1000
    13305 sleep 1000
  13295 sh -c sleep 1000
    13306 sleep 1000
  13296 sh -c sleep 1000
    13307 sleep 1000
  ......
    ......

# ps -ef -o 'pid,ppid' | nawk -v ppid=5847 '$2==ppid{++s}END{print s}'
100

# tail /var/cron/log
! c queue max run limit reached Thu Jul  2 21:29:00 2009
! rescheduling a cron job Thu Jul  2 21:29:00 2009
! c queue max run limit reached Thu Jul  2 21:29:00 2009
! rescheduling a cron job Thu Jul  2 21:29:00 2009
! c queue max run limit reached Thu Jul  2 21:29:00 2009
! rescheduling a cron job Thu Jul  2 21:29:00 2009
! c queue max run limit reached Thu Jul  2 21:29:00 2009
! rescheduling a cron job Thu Jul  2 21:29:00 2009
! c queue max run limit reached Thu Jul  2 21:29:00 2009
! rescheduling a cron job Thu Jul  2 21:29:00 2009

If you are interested in cron queue size over time, you may want to put the above in a script and print out the queue size with timestamp. gnuplot is a very useful tool to visualise time-based data.

Labels:

Wednesday, July 01, 2009

What Can You Do If You Do Not Win $1 Million From Netflix Prize Contest

My friend blogged about the recent winner who has just broken the 10% improvement over the existing Netflix Cinematch algorithm.

Do you know that you can achieve a lot even if you are not the winner. "Just A Guy In A Garage" blogged about what's after netflix, although his score only ranked 18 in the leaderboard.

Labels: