Saturday, July 28, 2007

Sun Grid Engine (SGE) Solving Recursion

During one of the demonstrations or benchmarks, whatever you want to call it, we were asked to use Sun Grid Engine (SGE) to show how it can be used to solve a recursive calculation.

The formula is like this: In = In-1 + In-2 , if n=0, I0 = 1 , if n=1, I1 = 1

If n=4, anwser is 8; if n=5, answer is 13

Easy, right? Yes, if you are going to program using your favourite programming language. But not using a grid engine, truss me.

Programming in a shell script way, it will be like this, easy peezy!

#! /bin/sh

calculate()
{
      echo `expr $1 + $2`
}


if [ $# -ne 1 ]; then
      echo "Usage: $0 <no>"
      exit 1
fi
num=$1


if [ $num -eq 0 ]; then
      result=1
elif [ $num -eq 1 ]; then
      result=2
else
      d1=`expr $num - 1`
      d2=`expr $num - 2`
      result1=`$0 $d1`
      result2=`$0 $d2`
      result=`calculate $result1 $result2`
fi
echo $result

Let's visualise the tree in order to get a feel of what we are going to deal with. Let's start with n=4

        4
       / \
      3    2
     / \  / \
    2  1  1  0
   / \
  1   0

How can we use SGE to simulate this recursion ? How can the 2 children know who their parent is after the jobs are submitted to execution nodes ? How to keep track of their results ? How can these results get pass back to the parent ? How to eventually stop and wind back up to the root node ?

Yes, these are the questions in my mind when I started off with this exercise. Becasue SGE is not a sub-second scheduler and therefore I can "qsub" 2 jobs and immediate establish the parent-child relationship using the "-hold_jid" (hold job dependency). In order for the children to know what their parent is, I parsed the "-v PARENT=value" in the "qsub", so that they can get the PARENT environment variable.

How to keep track of result at every stage ? I made use of some temporary files, however I need to ensure uniqueness of these files. We can accomplish this by taking a MD5 of the first 32 bytes (can be any size of bytes) in /dev/urandom device in Solaris.

At first, my SGE environment is configured with 12 slots and I can only run n=3 and n=4. When n=5, it simply ran out of slots. Here, I realised that slots/queue is very similar to memory/stack in a computer. In a recursive program, it has to push things onto the stack and let the pop/push of stack to sort itself out. If your stack size is too small, your recursive program will break which is similar to our slots in SGE, except that all those jobs will be on hold. They are going to hold forever until you "gdel" them.

As for n > 5, I have to increase the number of slots to make it work. Below shows the two scripts, namely the one doing all the recursion (rec.sh) and the one doing the calculation (cal.sh). It also shows "qstat" in action every 5 seconds interval. Job name "-N" in "qsub" is needed in order for the final value to be stored in the file "five".

$ cat rec.sh
#! /bin/sh
#$ -cwd
#$ -S /bin/sh
#$ -o /dev/null
#$ -e /dev/null


jndir=tmp
[ ! -d $jndir ] && mkdir $jndir


# loop around qacct to wait for result
getResult()
{
      while :
      do
              qacct -j $1 > /dev/null 2>&1
              if [ $? -eq 0 ]; then
                      cat $jndir/$1
                      return
              fi
              sleep 1
      done

      # it shouldn't reach here
      exit 1
}



if [ $# -ne 1 ]; then
      echo "Usage: $0 "
      exit 1
fi

. /opt/n1ge/default/common/settings.sh


number=$1
jobname=$JOB_NAME


if [ $number -eq 0 ]; then
      result=1
elif [ $number -eq 1 ]; then
      result=2
else
      d1=`expr $number - 1`
      d2=`expr $number - 2`
      jn1="jn-`dd if=/dev/urandom bs=32 count=1 2>/dev/null | digest -a md5`"
      jn2="jn-`dd if=/dev/urandom bs=32 count=1 2>/dev/null | digest -a md5`"

      qsub -N $jn1 -v PARENT=$jobname $0 $d1 > /dev/null 2>&1
      qsub -N $jn2 -v PARENT=$jobname $0 $d2 > /dev/null 2>&1

      # if there is parent-child relationship, establish it
      if [ "X$PARENT" != "X" ]; then
              qalter -hold_jid $jn1,$jn2 $jobname > /dev/null 2>&1
      fi

      result1=`getResult $jn1`
      result2=`getResult $jn2`

      result=`./cal.sh $result1 $result2`
fi


# write result to file
echo $result > $jndir/$jobname

$ cat cal.sh
#! /bin/sh

echo `expr $1 + $2`

$ qsub -N five rec.sh 5
Your job 463 ("five") has been submitted.

$ while :
do
qstat
sleep 5
done
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
  464 0.00000 jn-1c84996 chihung      qw    07/27/2007 23:15:31                                    1
  465 0.00000 jn-0ed40af chihung      qw    07/27/2007 23:15:31                                    1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
  464 0.00000 jn-1c84996 chihung      qw    07/27/2007 23:15:31                                    1
  465 0.00000 jn-0ed40af chihung      qw    07/27/2007 23:15:31                                    1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
  464 0.00000 jn-1c84996 chihung      qw    07/27/2007 23:15:31                                    1
  465 0.00000 jn-0ed40af chihung      qw    07/27/2007 23:15:31                                    1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  465 0.55500 jn-0ed40af chihung      hr    07/27/2007 23:15:46 all.q@sgeexec0                     1
  464 0.55500 jn-1c84996 chihung      hr    07/27/2007 23:15:46 all.q@sgeexec1                     1
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
  466 0.00000 jn-66ad4b8 chihung      qw    07/27/2007 23:15:46                                    1
  467 0.00000 jn-1ddba59 chihung      qw    07/27/2007 23:15:46                                    1
  468 0.00000 jn-156ac6d chihung      qw    07/27/2007 23:15:46                                    1
  469 0.00000 jn-260acb5 chihung      qw    07/27/2007 23:15:46                                    1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  465 0.55500 jn-0ed40af chihung      hr    07/27/2007 23:15:46 all.q@sgeexec0                     1
  464 0.55500 jn-1c84996 chihung      hr    07/27/2007 23:15:46 all.q@sgeexec1                     1
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
  466 0.00000 jn-66ad4b8 chihung      qw    07/27/2007 23:15:46                                    1
  467 0.00000 jn-1ddba59 chihung      qw    07/27/2007 23:15:46                                    1
  468 0.00000 jn-156ac6d chihung      qw    07/27/2007 23:15:46                                    1
  469 0.00000 jn-260acb5 chihung      qw    07/27/2007 23:15:46                                    1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  465 0.55500 jn-0ed40af chihung      hr    07/27/2007 23:15:46 all.q@sgeexec0                     1
  464 0.55500 jn-1c84996 chihung      hr    07/27/2007 23:15:46 all.q@sgeexec1                     1
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
  466 0.00000 jn-66ad4b8 chihung      qw    07/27/2007 23:15:46                                    1
  467 0.00000 jn-1ddba59 chihung      qw    07/27/2007 23:15:46                                    1
  468 0.00000 jn-156ac6d chihung      qw    07/27/2007 23:15:46                                    1
  469 0.00000 jn-260acb5 chihung      qw    07/27/2007 23:15:46                                    1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  465 0.55500 jn-0ed40af chihung      hr    07/27/2007 23:15:46 all.q@sgeexec0                     1
  464 0.55500 jn-1c84996 chihung      hr    07/27/2007 23:15:46 all.q@sgeexec1                     1
  467 0.55500 jn-1ddba59 chihung      hr    07/27/2007 23:16:01 all.q@sgeexec1                     1
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
  466 0.55500 jn-66ad4b8 chihung      hr    07/27/2007 23:16:01 all.q@sgeexec2                     1
  469 0.55500 jn-260acb5 chihung      hr    07/27/2007 23:16:01 all.q@sgeexec2                     1
  470 0.00000 jn-85a72cc chihung      qw    07/27/2007 23:16:01                                    1
  471 0.00000 jn-0ecd1c5 chihung      qw    07/27/2007 23:16:01                                    1
  472 0.00000 jn-f7b0c72 chihung      qw    07/27/2007 23:16:01                                    1
  473 0.00000 jn-c69525b chihung      qw    07/27/2007 23:16:01                                    1
  474 0.00000 jn-f7175e4 chihung      qw    07/27/2007 23:16:02                                    1
  475 0.00000 jn-f31a24c chihung      qw    07/27/2007 23:16:02                                    1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  465 0.55500 jn-0ed40af chihung      hr    07/27/2007 23:15:46 all.q@sgeexec0                     1
  464 0.55500 jn-1c84996 chihung      hr    07/27/2007 23:15:46 all.q@sgeexec1                     1
  467 0.55500 jn-1ddba59 chihung      hr    07/27/2007 23:16:01 all.q@sgeexec1                     1
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
  466 0.55500 jn-66ad4b8 chihung      hr    07/27/2007 23:16:01 all.q@sgeexec2                     1
  469 0.55500 jn-260acb5 chihung      hr    07/27/2007 23:16:01 all.q@sgeexec2                     1
  470 0.00000 jn-85a72cc chihung      qw    07/27/2007 23:16:01                                    1
  471 0.00000 jn-0ecd1c5 chihung      qw    07/27/2007 23:16:01                                    1
  472 0.00000 jn-f7b0c72 chihung      qw    07/27/2007 23:16:01                                    1
  473 0.00000 jn-c69525b chihung      qw    07/27/2007 23:16:01                                    1
  474 0.00000 jn-f7175e4 chihung      qw    07/27/2007 23:16:02                                    1
  475 0.00000 jn-f31a24c chihung      qw    07/27/2007 23:16:02                                    1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  465 0.55500 jn-0ed40af chihung      hr    07/27/2007 23:15:46 all.q@sgeexec0                     1
  464 0.55500 jn-1c84996 chihung      hr    07/27/2007 23:15:46 all.q@sgeexec1                     1
  467 0.55500 jn-1ddba59 chihung      hr    07/27/2007 23:16:01 all.q@sgeexec1                     1
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
  466 0.55500 jn-66ad4b8 chihung      hr    07/27/2007 23:16:01 all.q@sgeexec2                     1
  469 0.55500 jn-260acb5 chihung      hr    07/27/2007 23:16:01 all.q@sgeexec2                     1
  470 0.00000 jn-85a72cc chihung      qw    07/27/2007 23:16:01                                    1
  471 0.00000 jn-0ecd1c5 chihung      qw    07/27/2007 23:16:01                                    1
  472 0.00000 jn-f7b0c72 chihung      qw    07/27/2007 23:16:01                                    1
  473 0.00000 jn-c69525b chihung      qw    07/27/2007 23:16:01                                    1
  474 0.00000 jn-f7175e4 chihung      qw    07/27/2007 23:16:02                                    1
  475 0.00000 jn-f31a24c chihung      qw    07/27/2007 23:16:02                                    1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  465 0.55500 jn-0ed40af chihung      hr    07/27/2007 23:15:46 all.q@sgeexec0                     1
  464 0.55500 jn-1c84996 chihung      hr    07/27/2007 23:15:46 all.q@sgeexec1                     1
  467 0.55500 jn-1ddba59 chihung      hr    07/27/2007 23:16:01 all.q@sgeexec1                     1
  471 0.55500 jn-0ecd1c5 chihung      hr    07/27/2007 23:16:16 all.q@sgeexec1                     1
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
  466 0.55500 jn-66ad4b8 chihung      r     07/27/2007 23:16:01 all.q@sgeexec2                     1
  476 0.00000 jn-9ae8029 chihung      qw    07/27/2007 23:16:16                                    1
  477 0.00000 jn-d126937 chihung      qw    07/27/2007 23:16:16                                    1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  465 0.55500 jn-0ed40af chihung      r     07/27/2007 23:15:46 all.q@sgeexec0                     1
  464 0.55500 jn-1c84996 chihung      hr    07/27/2007 23:15:46 all.q@sgeexec1                     1
  467 0.55500 jn-1ddba59 chihung      hr    07/27/2007 23:16:01 all.q@sgeexec1                     1
  471 0.55500 jn-0ecd1c5 chihung      hr    07/27/2007 23:16:16 all.q@sgeexec1                     1
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
  476 0.00000 jn-9ae8029 chihung      qw    07/27/2007 23:16:16                                    1
  477 0.00000 jn-d126937 chihung      qw    07/27/2007 23:16:16                                    1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  465 0.55500 jn-0ed40af chihung      r     07/27/2007 23:15:46 all.q@sgeexec0                     1
  464 0.55500 jn-1c84996 chihung      hr    07/27/2007 23:15:46 all.q@sgeexec1                     1
  467 0.55500 jn-1ddba59 chihung      hr    07/27/2007 23:16:01 all.q@sgeexec1                     1
  471 0.55500 jn-0ecd1c5 chihung      hr    07/27/2007 23:16:16 all.q@sgeexec1                     1
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
  476 0.00000 jn-9ae8029 chihung      qw    07/27/2007 23:16:16                                    1
  477 0.00000 jn-d126937 chihung      qw    07/27/2007 23:16:16                                    1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  464 0.55500 jn-1c84996 chihung      hr    07/27/2007 23:15:46 all.q@sgeexec1                     1
  467 0.55500 jn-1ddba59 chihung      r     07/27/2007 23:16:01 all.q@sgeexec1                     1
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  464 0.55500 jn-1c84996 chihung      hr    07/27/2007 23:15:46 all.q@sgeexec1                     1
  467 0.55500 jn-1ddba59 chihung      r     07/27/2007 23:16:01 all.q@sgeexec1                     1
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  464 0.55500 jn-1c84996 chihung      hr    07/27/2007 23:15:46 all.q@sgeexec1                     1
  467 0.55500 jn-1ddba59 chihung      r     07/27/2007 23:16:01 all.q@sgeexec1                     1
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  464 0.55500 jn-1c84996 chihung      r     07/27/2007 23:15:46 all.q@sgeexec1                     1
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  464 0.55500 jn-1c84996 chihung      r     07/27/2007 23:15:46 all.q@sgeexec1                     1
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  464 0.55500 jn-1c84996 chihung      r     07/27/2007 23:15:46 all.q@sgeexec1                     1
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  463 0.55500 five       chihung      r     07/27/2007 23:15:31 all.q@sgeexec2                     1
^C

$ ls tmp
five                                 jn-85a72cc20898241095d1489ebbb02ca7
jn-0ecd1c5c6721976cb9ca96c902fb1044  jn-9ae80293acaf8fc426bf7972ad8f6c38
jn-0ed40afa95dde7d92d3f5784fc397a2c  jn-c69525b4ba477a7ce110aab8a2438c45
jn-156ac6d58000f8c8c18b973959622464  jn-d126937687b91b5f6b14e1b642aedc60
jn-1c849969b8c86e907bd32255058313e6  jn-f31a24c2219734f3bef13f6ce5addf78
jn-1ddba5925ff899f36812527220fe19a9  jn-f7175e4d15c7e08a98a7c13915d0121e
jn-260acb532729915691a6b096eeb69e9b  jn-f7b0c72538b479c4615ad5abd0214660
jn-66ad4b80af12a4b4c7469e76d50406a7

$ cat tmp/five
13

Interesting ? For me, definitely.

Labels: ,

0 Comments:

Post a Comment

<< Home