Wednesday, May 14, 2008

SGE Accounting

Sun Grid Engine accounting (5) provides a very nice utility to summarise accounting information of all your grid jobs. In my recent blog regarding SGE for rendering, I recommeneded to include "-A" flag (for accounting string) and "-P" flag (for project name) in qsub. This will help you to extract the accounting information for a particular job. Now it is time to reap the benefits. Below shows how you can extract the total accounting information per project as well as per accounting string. In my case, the accounging string and job name are the same as the scene file name without the file extension.
$ qacct -P myproject
PROJECT     WALLCLOCK         UTIME         STIME           CPU             MEMORY                 IO                IOW
========================================================================================================================
myproject    33970465      47273362       1178602      48638827       56454795.153              0.000              0.000

$ qacct -A Scene1
Total System Usage
    WALLCLOCK         UTIME         STIME           CPU             MEMORY                 IO                IOW
================================================================================================================
       101114        169445          4951        174490         216975.065              0.000              0.000

In my recent rendering project, I have to deal with 1000+ scene files. To be extact, 1224 unique scene files. We should be able to find out the run time information of every single job by looping through them and run qacct on that. However, it is going to be very inefficient 'cos we have to read the accounting file 1224 times. Also the output format cannot be imported to any spreadsheet program for further analysis.

$ awk -F: '$32~/^myproject$/{print $7}' accounting | sort | uniq | wc -l
1224

$ for i in `awk -F: '$32~/^myproject$/{print $7}' accounting | sort | uniq`
do
 echo $i
 qacct -A $i
done

accounting (5) has documented clearly every field in the accounting file. We can write an awk program to loop through the file once and print out all these information as CVS (Comma Separated Variable).

#! /bin/sh


awk '
BEGIN {
        FS=":"
        OFS=","
}
$32 ~ /^myproject$/ {
        scene=$7
        wallclock[scene]+=$14
        utime[scene]+=$15
        stime[scene]+=$16
        cpu[scene]+=$37
        mem[scene]+=$38
        io[scene]+=$39
        iow[scene]+=$41

        # no of jobs per scene
        ++job[scene]
}
END {
        print "SCENE_NAME","NO_OF_JOBS","WALLCLOCK","UTIME","STIME","CPU","MEMORY","IO","IOW"
        for ( i in wallclock ) {
                print i,job[i],wallclock[i],utime[i],stime[i],cpu[i],mem[i],io[i],iow[i]
        }
}' accounting

However, we are still dealing with lot of data. How about visualising the jobs using Gnuplot. With the raw accounting data, you can find out the start and end time of each job. Tcl has excellent utility to convert epoch time to other date/time format. Although gnuplot can handle epoch time plotting, I realised that they always based on GMT+0 and that will mess up the x-axis label. Anyway, here is the Tcl program to extract the start time and summarise it based on per day per hour.

set fp [open accounting r]
while { [gets $fp line] >= 0 } {
        set lline [split $line :]

        set project [lindex $lline 31]

        if { $project != "myproject" } {
                continue
        }

        set jobname [lindex $lline 4]
        set starttime [lindex $lline 9]

        set ymdh [clock format $starttime -format {%Y-%m-%d %H:00:00}]

        if { [info exists stats($ymdh)] == 0 } {
                set stats($ymdh) 1
        } else { 
                incr stats($ymdh)
        }
}
close $fp


foreach i [lsort [array names stats]] {
        puts "$i $stats($i)"
}
And the corresponding output (stats.txt) is like this:
...
2008-04-25 22:00:00 137
2008-04-25 23:00:00 83
2008-04-26 00:00:00 83
2008-04-26 01:00:00 86
2008-04-26 02:00:00 84
...

Below gnuplot file will visualise the number of jobs started hourly for your entire project.

set terminal png
set output 'stats.png'
set xdata time
set size 1,0.5
set timefmt '%Y-%m-%d %H:%M:%S'
set xrange ['2008-04-23 00:00:00':'2008-05-14 23:59:00']
set yrange [0:]
set title 'Rendering - myproject'
set ylabel 'Jobs per hour'
set xtics 86400 offset 2
set format x "%d\n%b\n%a"
set grid
plot 'stats.txt' using 1:3 with impulses title '#Jobs Started'

Labels: , ,

0 Comments:

Post a Comment

<< Home