Thursday, July 02, 2009

How to find cron queue size

If your system replies heavy on cron (1M) to schedule jobs, you may be interested to know how many cron jobs are running. In Solaris, the default queue size for crontab(1) is 100, you can verify that with the source code.

When cron exceeds the limit, the log (/var/cron/log) will report this and it will not be able to schedule any jobs until the queue size is lower than the limit

! c queue max run limit reached Thu Jul  2 21:22:00 2009
! rescheduling a cron job Thu Jul  2 21:22:00 2009

To find out how many child processes running under the cron, we need to know the pid of cron. If your system runs Solaris container (zone), there will be more than one cron processes if you do ps -ef | grep cron. In order to exactly determine the pid of your cron, you need to specify whatever zone you are in to grep the process using pgrep(1). Once we have cron's pid, we can do a ps listing to find out all the child processes with such a parent pid.

# pgrep -x -z `zonename` cron
5847

# ptree 5847
5847  /usr/sbin/cron
  13293 sh -c sleep 1000
    13305 sleep 1000
  13295 sh -c sleep 1000
    13306 sleep 1000
  13296 sh -c sleep 1000
    13307 sleep 1000
  ......
    ......

# ps -ef -o 'pid,ppid' | nawk -v ppid=5847 '$2==ppid{++s}END{print s}'
100

# tail /var/cron/log
! c queue max run limit reached Thu Jul  2 21:29:00 2009
! rescheduling a cron job Thu Jul  2 21:29:00 2009
! c queue max run limit reached Thu Jul  2 21:29:00 2009
! rescheduling a cron job Thu Jul  2 21:29:00 2009
! c queue max run limit reached Thu Jul  2 21:29:00 2009
! rescheduling a cron job Thu Jul  2 21:29:00 2009
! c queue max run limit reached Thu Jul  2 21:29:00 2009
! rescheduling a cron job Thu Jul  2 21:29:00 2009
! c queue max run limit reached Thu Jul  2 21:29:00 2009
! rescheduling a cron job Thu Jul  2 21:29:00 2009

If you are interested in cron queue size over time, you may want to put the above in a script and print out the queue size with timestamp. gnuplot is a very useful tool to visualise time-based data.

Labels:

2 Comments:

Blogger Matt Warner said...

This is no longer true, at least not in Solaris 10. The source code does show that 100 is the default value (line 226 of cron.c) but at line 3082 the code looks for queuedefs and loads values from that file. This is also proven out in testing. I hope someone else will find this useful when dealing with the max run message.

7:16 AM  
Blogger chihungchan said...

Yes you are right. Most of the people will not fiddle with queuedef for a reason. In my environment, jobs do occasionally hang and you do not want jobs to pile up in the queue.

In fact this is one of the monitoring parameter in our Nagios implementation.

10:06 AM  

Post a Comment

<< Home