Sunday, May 29, 2011

Monitor Solaris Zones with prstat

Solaris provides an interactive command line tool, prstat, to help you to monitor the zone utilisation if you provide "-Z" flag. The screen displays both processes info and zones info to fit your terminal window.
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
  8632 chihung  7204K 6792K cpu10   49    0   0:00:00 0.0% prstat/1
   584 root     1696K  684K sleep   59    0   0:00:00 0.0% smcboot/1
   260 root     2008K 1244K sleep   59    0   0:00:00 0.0% ttymon/1
   228 root     2264K  924K sleep   59    0   0:01:16 0.0% cron/1
   112 root     3524K 2600K sleep   59    0   0:00:00 0.0% picld/5
   110 root     2140K 1288K sleep   59    0   0:00:00 0.0% syseventd/14
   255 root     1696K  904K sleep   59    0   0:00:05 0.0% sac/1
   231 daemon   2280K  940K sleep   59    0   0:00:00 0.0% rpcbind/1
   130 root     7952K 6216K sleep   59    0   0:00:03 0.0% devfsadm/72
 22936 root     3348K 1356K sleep   59    0   0:00:00 0.0% sshd/1
  9186 root     4424K 2508K sleep   59    0   0:00:00 0.0% htt_server/2
   237 daemon   2028K 1276K sleep   60  -20   0:00:00 0.0% nfs4cbd/2
   138 daemon   4076K 2188K sleep   59    0   0:09:56 0.0% kcfd/4
   238 daemon    495M  494M sleep   59    0   0:06:46 0.0% nfsmapid/4
   137 root     1312K  888K sleep   59    0   0:00:00 0.0% powerd/2
     9 root     9672K 8720K sleep   59    0   0:03:00 0.0% svc.configd/17
     7 root       12M   10M sleep   59    0   0:02:47 0.0% svc.startd/13
   239 daemon   2336K 1512K sleep   59    0   0:00:00 0.0% statd/1
ZONEID    NPROC  SIZE   RSS MEMORY      TIME  CPU ZONE                        
     0      123 1489M 1008M   3.0%  10:01:44 0.0% global                      
    26       29   90M   56M   0.1%   0:18:41 0.0% john                      
    27       33  169M  123M   0.3%   0:23:04 0.0% mark                 
     5       31  155M  118M   0.3%   0:18:47 0.0% node102                     
     2       32  161M  122M   0.3%   1:21:49 0.0% sgeexec2                    
    62       33  168M  122M   0.3%   0:22:38 0.0% henry                     
    19       29   89M   54M   0.1%   0:18:56 0.0% peter                     
Total: 2221 processes, 8427 lwps, load averages: 1.65, 4.26, 2.60

With "-c" flag to avoid overwritting the previous display and "-n 1,99999" to tell prstat to display up to 99999 zones (that should be enough for all your zones) information, you can pipe that to AWK to extract the MEMORY and CPU. If you schedule this task to start at mid-night daily at a sampling interval of 300 with 288 samples, you cover the whole day monitoring of all your zones.

Here is a script to convert the prstat zone data to CSV

#! /bin/ksh


export PATH=/usr/bin:/bin:/usr/sbin:/sbin


# zone IDs store as comma separated
zids=`zoneadm list -v | awk 'NR>1{printf("%d,",$1)}'`
zids=${zids%,}

# zone NAMEs store as comma separated
znames=`zoneadm list -v | awk 'NR>1{printf("%s,",$2)}'`
znames=${znames%,}


prstat -Z -n 1,99999 $1 $2 | nawk -v zids=$zids -v znames=$znames '
# print header (mem/* and cpu/*)
BEGIN {
 n=split(zids,a,",")
 m=split(znames,b,",")
 # %mem ($5)
 for(i=1;i<=n;++i){
  printf("mem/%s,",b[i])
 }
 # %cpu ($7)
 for(i=1;i<=n;++i){
  printf("cpu/%s,",b[i])
 }
 printf("\n")
}
# store each interval in array
/ZONEID/,/Total/ {
 if ( $0 !~ /ZONEID/ && $0 !~ /Total/ ) {
  gsub("%","")
  mem[$1]=$5
  cpu[$1]=$7
 }
}
/Total/ {
 for(i=1;i<=n;++i) {
  printf("%s,",mem[a[i]])
 }
 for(i=1;i<=n;++i) {
  printf("%s,",cpu[a[i]])
 }
 printf("\n")
}'

Let's run through it with an interval of 10 seconds for 4 samples. If you have 3 zones in this machine, the output will look like this

./zstat.sh 10 4
mem/global,mem/zone1,mem/zone2,mem/zone3,cpu/global,cpu/zone2,cpu/zone3,
3.2,1.2,1.0,1.0,23.0,5.0,7.0,9.0,2.0,
4.7,1.5,1.7,1.5,31.0,15.0,6.0,9.0,1.0,
11.0,5.7,2.3,3.0,40.0,5.0,22.0,13.0,5.0,
13.2,7.2,4.0,2.0,34.0,5.0,17.0,10.0,7.0,
I ran this script in my machines with 67 zones and imported the CSV into openoffice, here is the sample output:

Labels: , ,