Chi Hung Chan: December 2008

Wednesday, December 31, 2008

Looping in Hex

In Linux, you have seq utility (btw, I realised OpenSolaris 200811 comes with seq) that can print a sequence of numbers. This is a very useful command especially if you need to do loops. With seq, you can avoid doing any increment of counters. You can do this with
for i in `seq 1 10`; do echo $i; done

However, seq can only handle decimal (base 10) input arguments. What if I want seq to do stuff in hex (base 16), eg. loop from 00 to FF in step of 5. At first I was thinking of creating two functions (hex2dec and dec2hex) so that I can do this:

input arguments in hex convert to decimal (use hex2dec)
increment the looping counter in decimal
convert any ouput to hex (use dec2hex)

The above steps work fine but it will involve too many processes to be forked out from the parent shell. Although I was using bc (arbitrary precision calculator language) to do conversion between base10 and base16, I did not explore the looping within the bc.

Do you know that bc language provides "for" loop construct ? Now I can bring my loop out of the shell script into bc and I do not have to do any counter increment. Since bc cannot handle both ibase=16 and obase=16 together, I need to convert the input hex number to decimal and let bc to work out all the output in hex. I even include an optional -w flag to equalise the width by padding with leading zeroes.

Here is my seq2hex.sh

#! /bin/sh


hex2dec()
{
    hex=`echo $1 | tr '[a-z]' '[A-Z]'`
    echo "ibase=16;$hex" | bc
}


usage()
{
    echo "Usage: $0 [-w width] start end"
    echo "Usage: $0 [-w width] start step end"
}
    

width=""
while getopts w: c
do
    case $c in
    w)   width=$OPTARG ;;
    \?)  usage; exit 1  ;;
    esac
done
shift `expr $OPTIND - 1`


# input arguments in hex, convert to decimal
if [ $# -eq 2 ]; then
    start=`hex2dec $1`
    end=`hex2dec $2`
    step=1
elif [ $# -eq 3 ]; then
    start=`hex2dec $1`
    step=`hex2dec $2`
    end=`hex2dec $3`
else
    usage
    exit 2
fi


echo "obase=16; for(i=$start;i<=$end;i+=$step) { i }" | \
    bc | \
    awk '{printf("%0'$width's\n",$1)}'

seqhex.sh in action:

$ ./seqhex.sh
Usage: ./seqhex.sh [-w width] start end
Usage: ./seqhex.sh [-w width] start step end

$ ./seqhex.sh 6 F
6
7
8
9
A
B
C
D
E
F

$ ./seqhex.sh AA BB
AA
AB
AC
AD
AE
AF
B0
B1
B2
B3
B4
B5
B6
B7
B8
B9
BA
BB

$ ./seqhex.sh AA 5 BB
AA
AF
B4
B9

$ ./seqhex.sh -w 4 AA 5 BB
00AA
00AF
00B4
00B9

Labels: bc, shell script

Sunday, December 21, 2008

Power of Eval

The power of shell script/scripting language/dynamic language, whatever you want it call it, is the ability to create and execute arbitrary commands on the fly. This feature is not available in other statically-typed programming languages.

These dynamic languages is able to do variable substitution before execution. With the built-in eval command, they are able to do two parses before submitting its content to the interpreter for final evaluation.

Here I am showing you an example of how we can make use of eval. In Bourne Shell, we need to do the following for number counting. It is almost a standard template you will see in script.

count=0
count=`expr $count + 1`

With eval, we are able to generalise the above template into a re-usable function. The function incr is able to take variable name as first argument and optional second argument as the step for increment (default is 1). Below shows how the function incr can apply to variables "counter1" and "counter2":

$ incr() { eval $1=\`expr \$\{$1:-0\} + ${2:-1}\`; }

$ echo $counter1


$ incr counter1

$ echo $counter1
1

$ incr counter1 5

$ echo $counter1
6

$ incr counter2 4

$ incr counter2 5

$ echo $counter2
9

When you run incr counter1, the incr function will change the definition to
incr() { counter1=`expr $counter1 + 1`; }.
Similar for counter2 and other variables. Now we are able to increment any variable without having to use the standard template. I hope you can see the power of eval from this simple example.

Labels: shell script

Monday, December 15, 2008

Google Chart

I blogged about Google Chart some time ago.

If you like charting, you will definitely appreciate this Google Chart Generator.

Labels: google

Find the Last Day of the Previous Month

If you need to generate monthly report for the previous month and you want to get cron to help you to automate the scheduling, you will need this script to determine exactly how many days in the previous month. For example, in sar (system activity report), by default the data are kept only for a month and they will be overwritten. Suppose you need to dump out the CPU utilisation for the entire previous month, you will need to schedule cron to run on the 1st day of the month to find out all the SAR files using
find /var/adm/sa -type f -mtime -30 -name "sa*"
if previous month has 30 days.

Here is short Python script that can do the job:

$ date
Mon Dec 15 21:00:08 MPST 2008

$ cat lastmth.py
#! /usr/bin/python

import datetime, time

mth_this_now   = datetime.date.today()
mth_this_start = datetime.date(mth_this_now.year, mth_this_now.month, 1)
mth_last_end   = mth_this_start - datetime.timedelta(days=1)
print mth_last_end.year, mth_last_end.month, mth_last_end.day

$ ./lastmth.py
2008 11 30

Labels: python, unix

Date Imprint on Photos

Just came back from holiday with tonnes of photos to organise. My new camera (8 Mega pixels) that I brought two months ago can only have date imprint if it is set to 2 Mega pixel resolution. With such a low resolution, I may not be able to develop them in 8R or bigger. Anyway, I wouldn't care more about the date. All I care about is resolution and file size. So, no date imprint.

Last night when I was talking to my brother-in-law, I realised that he encountered the same issue. What he did was to source for a Windows freeware to do the date imprint.

Do you know that ImagineMagick can do this job. The best part is that it can run on Cygwin, Windows. Here is my script to do this task, imprint the photo's modified timestamp using Python.

#! /bin/sh
#
# imprint the modified time of image (in yyyy-mm-dd format) 
# at the bottom right hand corner
#


PATH=/usr/bin:/bin


usage()
{
    echo "Usage: $0  "
    echo ""
}


if [ $# -ne 2 ]; then
    usage
    exit 1
fi


#
# find mtime of input image and output as yyyy-mm-dd
#
ymd=`python -c "import os,datetime;\
    t=os.path.getmtime('$1');\
    d=datetime.date.fromtimestamp(t);\
    print '%04d-%02d-%02d' % (d.year,d.month,d.day)"`


#
# if you are getting the below error, create a symbolic link to let cygwin
# to point to Windows fonts
#
#       convert: unable to read font `/usr/lib/ImageMagick-6.4.0/config//usr/share/fonts /corefonts/arial.ttf'.
#
#       ln -s /cygdrive/c/Windows/Fonts /usr/share/fonts/corefonts
#
convert "$1" -gravity SouthEast -fill orange -pointsize 60 \
    -draw "text 50,25 '$ymd'" -quality 100 "$2"

Downloaded this picture as a proof-of-concept. Here is the output with -quality 50 to reduce the size.

Labels: Cygwin, ImageMagick, python

Friday, December 05, 2008

Avoid Using Temporary Files, Part 2

Last Saturday, I blogged about how we can avoid using temporary files in shell scripting. At the end of that blog, I posted a question - how we can achieve this if the number of lines in all the command outputs are not the same.

My first implementation started off with two commands with unequal output and I managed to do that without much difficulty. I thought I was done with this. Wait! What if there more than 2 commands output, that means I have to rewrite this again. Why not we do it once and for all, craft a more generic function that is able to handle multiple command outputs.

My approach is to take advantage of sub shell. Also, I will introduce a "separator" in-between the commands so that the "paste" will be able to separate the output. Of course your "separator" has to be unique and it will not appear in any of the command output. I define my own "_paste_" command using AWK and store the commands output in memory using AWK associate array with the key based on "#file and #line". Here is my code:

$ cat t3.sh
#! /bin/sh

PATH=/usr/bin:/bin

seq()
{
    nawk -v start=$1 -v end=$2 '
        END {for(i=start;i<=end;++i){print i}}' /dev/null
}
calc1()
{
    for i in `seq $1 $2`
    do
        echo `expr $i \* $i`
    done
}
calc2()
{
    for i in `seq $1 $2`
    do
        echo `expr $i \* $i \* $i`
    done
}
_paste_()
{
    nawk -v sep=$sep '
        BEGIN {
            nfile=1
            nline=1
            max=0
        }
        $0==sep {
            ++nfile
            nline=1
            next
        }
        {
            if ( nline>max ) {
                max=nline
            }
            line[nfile,nline]=$0
            ++nline
        }
        END {
            for (l=1;l<=max;++l) {
                printf("%s", line[1,l])
                for (f=2;f<=nfile;++f) {
                    printf("\t%s", line[f,l])
                }
                printf("\n")
            }
        }'
}


sep="@@@@@"
(
   calc1 1 10; echo $sep
   calc2 1 13; echo $sep
   calc2 1 15
) | _paste_ 

$ ./t3.sh
1       1       1
4       8       8
9       27      27
16      64      64
25      125     125
36      216     216
49      343     343
64      512     512
81      729     729
100     1000    1000
        1331    1331
        1728    1728
        2197    2197
                2744
                3375

This implementation may not be very efficient especially if we have to deal with massive output from commands because all the data will be stored in memory. What I have in mind is to do this in Python, wanna give it a try?

Labels: awk, shell script

Thursday, December 04, 2008

OpenSolaris 2008.11 Finally Arrived

OpenSolaris 2008.11 has finally arrived yesterday, but in December(12) not November (11). Anyway, the installation was pretty smooth on my Sun xVM VirtualBox.

Previous OS 2008.05 was not very pleasant because the Package Manager GUI could not be launched for no reason. 2008.11 seems to be pretty good and now I am sitting in the coffee shop downloading Sun Studio.

There is a lot good stuff in this new release, example, ZFS Time Slider (something like MacOSX's Time Machine)

If you do not have time to read up, you may want to just view this 12min+ webcast

Labels: opensolaris

Tuesday, December 02, 2008

Turned A No-Brainer Task Into A Challenging Job

Yesterday I was tasked to write up a capacity report that was used to be carried out by the administrative staff. The instruction given to the admin staff is to look into the weekly graph (generated by RRDtool) and choose a busy day with the highest CPU utilisation. Once the date has been identified, he/she will have to view that particular day's CPU graph. If the utilisation is above certain threshold consecutively within a pre-defined period, the server will be classified as either Amber or Red depending on the threshold level.

Yes, this is a no brainer job. However, if you were to do it for near to hundred servers, a no brainer job will become a nerve cracking job. Sooner or later you will swear like hell. BTW, I did swear too. After all the swearing, I was wondering whether I can do a better job than what the admin staff used to do. I cannot possibly doing this manually every month, right?

After some exploratory works and understanding how the files store these information, I realised that I should be able to do that programmatically by dumping the RRD files into ASCII text, and in this case is in XML format. My next question will be, shall I use XML parser to extract the information ? But not for this case because the system does not have any XML toolkit installed. Also some of the XML toolkit may filter off the comments which I will need to tap onto (the timestamp in yyyy-mm-dd format and epoch time). This is a very useful piece of information to determine whether the server is "amber" or "red".

I always belief that I can extract anything as long as the output is generated by a program, it ought to have a pattern. Here I am showing you a dump of the RRD (at the end of this blog), can you see the pattern ?

I will not show you my code because it is rather involve and messy. However, I will describe my approach in getting things done. Basically I use a lot of UNIX pipes between a mixture of AWK and sed. FYI, I avoided using temporary file for all the processing

In my case, 1st <datasbase> stores the daily info, 2nd for weekly, 3rd for monthly and 4th for yearly (depending on how you create your RRD)
Use AWK/sed to pick up the data from 2nd <database>, ignore the NaN (not a number) record, extract the timestamp
Pipe that into another AWK to work out which date in the week has the highest CPU utilisation
Open up that day's RRD (apparently it is stored in another RRD)
Retrieve the 1st <database> data, that's the daily data
Work out the time difference between those records that are above the threshold
Count those records above the thresholds. Suppose the polling interval is 5 minutes, we should be seeing a continuous 300 seconds time difference in the filtered records.
If count exceeds the time specified (continuous 1 hour means 12 data points), we flag it out as either Amber or Red depending on the threshold

I hope you are still with me. The moral of the story is not about the above steps, it is about we should always try to find joy in doing our work no matter how dump it is. It looked like a no-brainer job at first, but at the end it turned out be pretty challenging one.

Here is the RRD file dump:

$ rrdtool dump some-rrd-file.rrd
<!-- Round Robin Database Dump -->
<rrd>
    <version> 0001 </version>
    <step> 15 </step> <!-- Seconds -->
    <lastupdate> 1222743000 </lastupdate> <!-- 2008-09-30 10:50:00 SGT -->

    <ds>
        <name> ds0 </name>
        <type> GAUGE </type>
        <minimal_heartbeat> 600 </minimal_heartbeat>
        <min> 0 </min>
        <max> 1.0000000e+02  </max>

        <!-- PDP Status -->
        <last_ds> 4.3240000e+01 </last_ds>
        <value> 0.0000000000e+00 </value>
        <unknown_sec> 0 </unknown_sec>
    </ds>

<!-- Round Robin Archives -->
    <rra>
        <cf> AVERAGE </cf>
        <pdp_per_row> 1 </pdp_per_row> <!-- 300 seconds -->
        <xff> 5.0000000000e-01 </xff>

        <cdp_prep>
            <ds><value> NaN </value>  <unknown_datapoints> 0 </unknown_datapoints></ds>
        </cdp_prep>
        <database>
            <!-- 2008-10-24 09:50:00 SGT / 1224813000 --> <row><v> 1.5000000e+01 </v></row>
            <!-- 2008-10-24 09:55:00 SGT / 1224813300 --> <row><v> 1.0234000e+01 </v></row>
            .....
        </database>
    </rra>
    <rra>
        ....
        <database>
            <!-- 2008-10-20 12:00:00 SGT / 1224475200 --> <row><v> 3.1365000e+01 </v></row>
            <!-- 2008-10-20 12:30:00 SGT / 1224477000 --> <row><v> 2.4532000e+01 </v></row>
            .....
        </database>
    </rra>
    <rra>
        ....
        <database>
            .....
        </database>
    </rra>
</rrd>

Labels: awk, sed, shell script, XML

Chi Hung Chan