Thursday, September 25, 2008

Paste a Few Files Into One and Plot ...

Suppose you have been monitoring a few parameters of a service periodically and output them in separate files. Each file format is colon separated with key value pair, and the key is in the timestamp YYYYmmddHHMMSS format. Now you are required to plot a few of these parameters together in a single plot.

Example:
File 1 Line 1 = 20080910111213:14
File 2 Line 1 = 20080910111213:11
File 3 Line 1 = 20080910111213:23
and you are required to put them together in one line so that you can reuse your generic plotting tool to visualise all these values in a single graph.
Output Line 1 = 20080910111213:14:11:23

Couple of assumption in this work:

  • data in each file is colon separated with
    1st field=time stamp format in YYYYmmddHHMMSS
    2nd field=value
  • all files are having the same time stamp in each corresponding row
Here is the script:
#! /bin/sh

if [ $# -lt 2 ]; then
 echo "Usage: $0 file file [file ...]"
 exit 1
fi


#
# assumptions:
# 1. assume data in each file is colon separated with
#    1st field: time stamp format in YYYYmmddHHMMSS
#    2nd field: value
#    Eg. 20080909121314:2
# 2. all files are having the same time stamp in each corresponding row
#


prefix=".tmprrd-$$"
sep=":"


#
# modify the time stamp for easy parsing in Tcl (clock scan)
#
count=1
suffix=`echo $count | awk '{printf("%03d",$1)}'`
awk -F"$sep" '{printf("%sT%s:%d\n",substr($1,1,8),substr($1,9,6),$2)}' $1 > \
 ${prefix}-${suffix}
count=`expr $count + 1`


#
# for each file (starting from 2nd), extract 2nd field to individual file
#
shift
for i in $@
do
 suffix=`echo $count | awk '{printf("%03d",$1)}'`
 awk -F"$sep" '{print $2}' $i > ${prefix}-${suffix}
 count=`expr $count + 1`
done


#
# 'paste' them together
#
paste -d "$sep" ${prefix}-*


#
# cleanup
#
rm -f ${prefix}-*

As you can see, I modified the time stamp from the first file and stored it in some temporary file with a suffix of "*-001". The time stamp has been reformatted to an acceptable format by Tcl clock scan because the generic plotting tool is implemented in Tcl. As for the values in each file (starting from the 2nd file 'cos the first one has been taken care of), I output them separately with suffix having 3 digit (zero padded) running number. This allows me to take advantage of shell wild card to ensure the sequence of the values corresponds to the sequence of the input files. With this, I can just paste ${prefix}-*.

The output is a beautiful graph.

Labels: , , ,

0 Comments:

Post a Comment

<< Home