During one of the demonstrations or benchmarks, whatever you want to call it, we were asked to use
Sun Grid Engine (SGE)
to show how it can be used to solve a recursive calculation.
The formula is like this:
In = In-1 + In-2
, if n=0, I0 = 1
, if n=1, I1 = 1
If n=4, anwser is 8; if n=5, answer is 13
Easy, right? Yes, if you are going to program using your favourite programming language. But not using a grid engine, truss me.
Programming in a shell script way, it will be like this, easy peezy!
#! /bin/sh
calculate()
{
echo `expr $1 + $2`
}
if [ $# -ne 1 ]; then
echo "Usage: $0 <no>"
exit 1
fi
num=$1
if [ $num -eq 0 ]; then
result=1
elif [ $num -eq 1 ]; then
result=2
else
d1=`expr $num - 1`
d2=`expr $num - 2`
result1=`$0 $d1`
result2=`$0 $d2`
result=`calculate $result1 $result2`
fi
echo $result
Let's visualise the tree in order to get a feel of what we are going to deal with. Let's start with n=4
4
/ \
3 2
/ \ / \
2 1 1 0
/ \
1 0
How can we use SGE to simulate this recursion ? How can the 2 children know who their parent is after the jobs are submitted to execution nodes ? How to keep track of their results ? How can these results get pass back to the parent ? How to eventually stop and wind back up to the root node ?
Yes, these are the questions in my mind when I started off with this exercise. Becasue SGE is not a sub-second scheduler and therefore I can "qsub" 2 jobs and immediate establish the parent-child relationship using the "-hold_jid" (hold job dependency). In order for the children to know what their parent is, I parsed the "-v PARENT=value" in the "qsub", so that they can get the PARENT environment variable.
How to keep track of result at every stage ? I made use of some temporary files, however I need to ensure uniqueness of these files. We can accomplish this by taking a MD5 of the first 32 bytes (can be any size of bytes) in /dev/urandom device in Solaris.
At first, my SGE environment is configured with 12 slots and I can only run n=3 and n=4. When n=5, it simply ran out of slots. Here, I realised that slots/queue is very similar to memory/stack in a computer. In a recursive program, it has to push things onto the stack and let the pop/push of stack to sort itself out. If your stack size is too small, your recursive program will break which is similar to our slots in SGE, except that all those jobs will be on hold. They are going to hold forever until you "gdel" them.
As for n > 5, I have to increase the number of slots to make it work. Below shows the two scripts, namely the one doing all the recursion (rec.sh) and the one doing the calculation (cal.sh). It also shows "qstat" in action every 5 seconds interval. Job name "-N" in "qsub" is needed in order for the final value to be stored in the file "five".
$ cat rec.sh
#! /bin/sh
#$ -cwd
#$ -S /bin/sh
#$ -o /dev/null
#$ -e /dev/null
jndir=tmp
[ ! -d $jndir ] && mkdir $jndir
# loop around qacct to wait for result
getResult()
{
while :
do
qacct -j $1 > /dev/null 2>&1
if [ $? -eq 0 ]; then
cat $jndir/$1
return
fi
sleep 1
done
# it shouldn't reach here
exit 1
}
if [ $# -ne 1 ]; then
echo "Usage: $0 "
exit 1
fi
. /opt/n1ge/default/common/settings.sh
number=$1
jobname=$JOB_NAME
if [ $number -eq 0 ]; then
result=1
elif [ $number -eq 1 ]; then
result=2
else
d1=`expr $number - 1`
d2=`expr $number - 2`
jn1="jn-`dd if=/dev/urandom bs=32 count=1 2>/dev/null | digest -a md5`"
jn2="jn-`dd if=/dev/urandom bs=32 count=1 2>/dev/null | digest -a md5`"
qsub -N $jn1 -v PARENT=$jobname $0 $d1 > /dev/null 2>&1
qsub -N $jn2 -v PARENT=$jobname $0 $d2 > /dev/null 2>&1
# if there is parent-child relationship, establish it
if [ "X$PARENT" != "X" ]; then
qalter -hold_jid $jn1,$jn2 $jobname > /dev/null 2>&1
fi
result1=`getResult $jn1`
result2=`getResult $jn2`
result=`./cal.sh $result1 $result2`
fi
# write result to file
echo $result > $jndir/$jobname
$ cat cal.sh
#! /bin/sh
echo `expr $1 + $2`
$ qsub -N five rec.sh 5
Your job 463 ("five") has been submitted.
$ while :
do
qstat
sleep 5
done
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
464 0.00000 jn-1c84996 chihung qw 07/27/2007 23:15:31 1
465 0.00000 jn-0ed40af chihung qw 07/27/2007 23:15:31 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
464 0.00000 jn-1c84996 chihung qw 07/27/2007 23:15:31 1
465 0.00000 jn-0ed40af chihung qw 07/27/2007 23:15:31 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
464 0.00000 jn-1c84996 chihung qw 07/27/2007 23:15:31 1
465 0.00000 jn-0ed40af chihung qw 07/27/2007 23:15:31 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
465 0.55500 jn-0ed40af chihung hr 07/27/2007 23:15:46 all.q@sgeexec0 1
464 0.55500 jn-1c84996 chihung hr 07/27/2007 23:15:46 all.q@sgeexec1 1
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
466 0.00000 jn-66ad4b8 chihung qw 07/27/2007 23:15:46 1
467 0.00000 jn-1ddba59 chihung qw 07/27/2007 23:15:46 1
468 0.00000 jn-156ac6d chihung qw 07/27/2007 23:15:46 1
469 0.00000 jn-260acb5 chihung qw 07/27/2007 23:15:46 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
465 0.55500 jn-0ed40af chihung hr 07/27/2007 23:15:46 all.q@sgeexec0 1
464 0.55500 jn-1c84996 chihung hr 07/27/2007 23:15:46 all.q@sgeexec1 1
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
466 0.00000 jn-66ad4b8 chihung qw 07/27/2007 23:15:46 1
467 0.00000 jn-1ddba59 chihung qw 07/27/2007 23:15:46 1
468 0.00000 jn-156ac6d chihung qw 07/27/2007 23:15:46 1
469 0.00000 jn-260acb5 chihung qw 07/27/2007 23:15:46 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
465 0.55500 jn-0ed40af chihung hr 07/27/2007 23:15:46 all.q@sgeexec0 1
464 0.55500 jn-1c84996 chihung hr 07/27/2007 23:15:46 all.q@sgeexec1 1
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
466 0.00000 jn-66ad4b8 chihung qw 07/27/2007 23:15:46 1
467 0.00000 jn-1ddba59 chihung qw 07/27/2007 23:15:46 1
468 0.00000 jn-156ac6d chihung qw 07/27/2007 23:15:46 1
469 0.00000 jn-260acb5 chihung qw 07/27/2007 23:15:46 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
465 0.55500 jn-0ed40af chihung hr 07/27/2007 23:15:46 all.q@sgeexec0 1
464 0.55500 jn-1c84996 chihung hr 07/27/2007 23:15:46 all.q@sgeexec1 1
467 0.55500 jn-1ddba59 chihung hr 07/27/2007 23:16:01 all.q@sgeexec1 1
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
466 0.55500 jn-66ad4b8 chihung hr 07/27/2007 23:16:01 all.q@sgeexec2 1
469 0.55500 jn-260acb5 chihung hr 07/27/2007 23:16:01 all.q@sgeexec2 1
470 0.00000 jn-85a72cc chihung qw 07/27/2007 23:16:01 1
471 0.00000 jn-0ecd1c5 chihung qw 07/27/2007 23:16:01 1
472 0.00000 jn-f7b0c72 chihung qw 07/27/2007 23:16:01 1
473 0.00000 jn-c69525b chihung qw 07/27/2007 23:16:01 1
474 0.00000 jn-f7175e4 chihung qw 07/27/2007 23:16:02 1
475 0.00000 jn-f31a24c chihung qw 07/27/2007 23:16:02 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
465 0.55500 jn-0ed40af chihung hr 07/27/2007 23:15:46 all.q@sgeexec0 1
464 0.55500 jn-1c84996 chihung hr 07/27/2007 23:15:46 all.q@sgeexec1 1
467 0.55500 jn-1ddba59 chihung hr 07/27/2007 23:16:01 all.q@sgeexec1 1
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
466 0.55500 jn-66ad4b8 chihung hr 07/27/2007 23:16:01 all.q@sgeexec2 1
469 0.55500 jn-260acb5 chihung hr 07/27/2007 23:16:01 all.q@sgeexec2 1
470 0.00000 jn-85a72cc chihung qw 07/27/2007 23:16:01 1
471 0.00000 jn-0ecd1c5 chihung qw 07/27/2007 23:16:01 1
472 0.00000 jn-f7b0c72 chihung qw 07/27/2007 23:16:01 1
473 0.00000 jn-c69525b chihung qw 07/27/2007 23:16:01 1
474 0.00000 jn-f7175e4 chihung qw 07/27/2007 23:16:02 1
475 0.00000 jn-f31a24c chihung qw 07/27/2007 23:16:02 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
465 0.55500 jn-0ed40af chihung hr 07/27/2007 23:15:46 all.q@sgeexec0 1
464 0.55500 jn-1c84996 chihung hr 07/27/2007 23:15:46 all.q@sgeexec1 1
467 0.55500 jn-1ddba59 chihung hr 07/27/2007 23:16:01 all.q@sgeexec1 1
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
466 0.55500 jn-66ad4b8 chihung hr 07/27/2007 23:16:01 all.q@sgeexec2 1
469 0.55500 jn-260acb5 chihung hr 07/27/2007 23:16:01 all.q@sgeexec2 1
470 0.00000 jn-85a72cc chihung qw 07/27/2007 23:16:01 1
471 0.00000 jn-0ecd1c5 chihung qw 07/27/2007 23:16:01 1
472 0.00000 jn-f7b0c72 chihung qw 07/27/2007 23:16:01 1
473 0.00000 jn-c69525b chihung qw 07/27/2007 23:16:01 1
474 0.00000 jn-f7175e4 chihung qw 07/27/2007 23:16:02 1
475 0.00000 jn-f31a24c chihung qw 07/27/2007 23:16:02 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
465 0.55500 jn-0ed40af chihung hr 07/27/2007 23:15:46 all.q@sgeexec0 1
464 0.55500 jn-1c84996 chihung hr 07/27/2007 23:15:46 all.q@sgeexec1 1
467 0.55500 jn-1ddba59 chihung hr 07/27/2007 23:16:01 all.q@sgeexec1 1
471 0.55500 jn-0ecd1c5 chihung hr 07/27/2007 23:16:16 all.q@sgeexec1 1
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
466 0.55500 jn-66ad4b8 chihung r 07/27/2007 23:16:01 all.q@sgeexec2 1
476 0.00000 jn-9ae8029 chihung qw 07/27/2007 23:16:16 1
477 0.00000 jn-d126937 chihung qw 07/27/2007 23:16:16 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
465 0.55500 jn-0ed40af chihung r 07/27/2007 23:15:46 all.q@sgeexec0 1
464 0.55500 jn-1c84996 chihung hr 07/27/2007 23:15:46 all.q@sgeexec1 1
467 0.55500 jn-1ddba59 chihung hr 07/27/2007 23:16:01 all.q@sgeexec1 1
471 0.55500 jn-0ecd1c5 chihung hr 07/27/2007 23:16:16 all.q@sgeexec1 1
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
476 0.00000 jn-9ae8029 chihung qw 07/27/2007 23:16:16 1
477 0.00000 jn-d126937 chihung qw 07/27/2007 23:16:16 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
465 0.55500 jn-0ed40af chihung r 07/27/2007 23:15:46 all.q@sgeexec0 1
464 0.55500 jn-1c84996 chihung hr 07/27/2007 23:15:46 all.q@sgeexec1 1
467 0.55500 jn-1ddba59 chihung hr 07/27/2007 23:16:01 all.q@sgeexec1 1
471 0.55500 jn-0ecd1c5 chihung hr 07/27/2007 23:16:16 all.q@sgeexec1 1
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
476 0.00000 jn-9ae8029 chihung qw 07/27/2007 23:16:16 1
477 0.00000 jn-d126937 chihung qw 07/27/2007 23:16:16 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
464 0.55500 jn-1c84996 chihung hr 07/27/2007 23:15:46 all.q@sgeexec1 1
467 0.55500 jn-1ddba59 chihung r 07/27/2007 23:16:01 all.q@sgeexec1 1
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
464 0.55500 jn-1c84996 chihung hr 07/27/2007 23:15:46 all.q@sgeexec1 1
467 0.55500 jn-1ddba59 chihung r 07/27/2007 23:16:01 all.q@sgeexec1 1
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
464 0.55500 jn-1c84996 chihung hr 07/27/2007 23:15:46 all.q@sgeexec1 1
467 0.55500 jn-1ddba59 chihung r 07/27/2007 23:16:01 all.q@sgeexec1 1
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
464 0.55500 jn-1c84996 chihung r 07/27/2007 23:15:46 all.q@sgeexec1 1
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
464 0.55500 jn-1c84996 chihung r 07/27/2007 23:15:46 all.q@sgeexec1 1
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
464 0.55500 jn-1c84996 chihung r 07/27/2007 23:15:46 all.q@sgeexec1 1
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
463 0.55500 five chihung r 07/27/2007 23:15:31 all.q@sgeexec2 1
^C
$ ls tmp
five jn-85a72cc20898241095d1489ebbb02ca7
jn-0ecd1c5c6721976cb9ca96c902fb1044 jn-9ae80293acaf8fc426bf7972ad8f6c38
jn-0ed40afa95dde7d92d3f5784fc397a2c jn-c69525b4ba477a7ce110aab8a2438c45
jn-156ac6d58000f8c8c18b973959622464 jn-d126937687b91b5f6b14e1b642aedc60
jn-1c849969b8c86e907bd32255058313e6 jn-f31a24c2219734f3bef13f6ce5addf78
jn-1ddba5925ff899f36812527220fe19a9 jn-f7175e4d15c7e08a98a7c13915d0121e
jn-260acb532729915691a6b096eeb69e9b jn-f7b0c72538b479c4615ad5abd0214660
jn-66ad4b80af12a4b4c7469e76d50406a7
$ cat tmp/five
13
Interesting ? For me, definitely.
Labels: SGE, Solaris