Calculate Min/Max/Sum/Avg/RMS/Std in a Script
I have been working on the Netflix predictions lately and have to find out the min, max, avg, root mean square, standard deviation every now and then. I have been typing a lot of these one-liners to work out some of these values and it is about time to put that all in one script.
Some of the sample one-liners to work out the sum and average:
awk '{s+=$1} END {print s}' data
awk '{s+=$1} END {print s/NR}' data
Below is the script that has been generalised to handle most of the situations such as applying statistical method on specific column, ability to handle various type of field separator.
$ cat ~/bin/calc.sh
#! /bin/sh
usage()
{
echo "Usage: $0 [-h] [-c column] [-m sum|avg|max|min|rms|std] [-f field_sep] [file ...]"
echo "\\tDefault: column 1, sum, white space, standard input"
}
#
# get arguments and flags
#
set -- `getopt c:m:f: $* 2>/dev/null`
if [ $? != 0 ]; then
usage
exit 1
fi
column=1
method=sum
fs='[ \t]+'
for i in $*; do
case $i in
-c)
column=$2
shift 2
;;
-f)
fs=$2
shift 2
;;
-m)
method=$2
shift 2
;;
-h)
usage
exit 1
;;
--)
shift
break
;;
esac
done
#
# default, standard input channel
#
args=${*:--}
#
# check arguments
#
if [ "$method" != "sum" -a \
"$method" != "max" -a \
"$method" != "min" -a \
"$method" != "avg" -a \
"$method" != "rms" -a \
"$method" != "std" ]; then
echo "Error. Method $method unsupported"
exit 2
fi
echo $column | egrep '^[1-9][0-9]*$' > /dev/null 2>&1
if [ $? -ne 0 ]; then
echo "Error. Column number has to be integer"
exit 2
fi
nawk -v col="$column" -v met="$method" -v fs="$fs" '
BEGIN {
FS=fs
max=-99999999999
min=99999999999
}
col<=NF{
if ( met == "sum" || met == "avg" ) {
sum+=$col
++count
} else if ( met == "std" ) {
sum+=$col
++count
term[count]=$col
} else if ( met == "std" ) {
diff=avg-$col
sum+=diff*diff
++count
} else if ( met == "rms" ) {
sum+=$col*$col
++count
} else if ( met == "max" ) {
if ( $col > max ) {
max=$col
}
} else if ( met == "min" ) {
if ( $col < min ) {
min=$col
}
}
}
END {
if ( met == "sum" ) {
print sum
} else if ( met == "avg" ) {
print sum/count
} else if ( met == "std" ) {
avg=sum/count
for(i in term) {
diff=term[i]-avg
std+=diff*diff
}
print sqrt(std/count)
} else if ( met == "rms" ) {
print sqrt(sum/count)
} else if ( met == "max" ) {
print max
} else if ( met == "min" ) {
print min
}
}' $args
See my script in action
$ ~/bin/calc.sh -h
Usage: /export/home/chihung/bin/calc.sh [-h] [-c column] [-m sum|avg|max|min|rms|std] [-f field_sep] [file ...]
Default: column 1, sum, white space, standard input
$ ~/bin/calc.sh -c 1 -m avg /tmp/x
3.51955
$ ~/bin/calc.sh -c 2 -m min /tmp/x
1.0
$ ~/bin/calc.sh -c 2 -m max /tmp/x
4.4
$ awk 'BEGIN{OFS=":"}{print $1,$2}' /tmp/x | ~/bin/calc.sh -c 1 -m avg -f :
3.51955
Labels: awk, shell script


0 Comments:
Post a Comment
<< Home