Saturday, September 27, 2008

Paste a Few Files Into One and Plot ... Take 2

My previous blog may help you to merge a few files together, but it only works based on the assumption that all files have to have the same amount of records and each record has to be sampled at the time stamp. As we all know, the world is not perfect and you are bound to have various types of data set. Some data set may have a different start and end time, some may have missing data during certain interval, some may have an extra few data points within the sampling interval. Certainly my previous script will break.

I was trying to work on this new requirement based on my previous work, but I was not getting anywhere and the code started to become 'spaghetti'. My new approach will be to work with epoch time as integer instead of YYYYmmddHHMMSs time format as string. Hey, this is another chance for me to brush up my Python skill. In this exercise, I tried to tap onto the wealth of Python modules (eg. sets, datetime, time) and they proved to be very handy.

Below is my data set and I am trying to put them side by side for easy comparison. As you can see they are all having different samping time, start time, end time. My python program not only will merge the 3 files together, it also round-off the timestamp based on the defined resolution. Average value will be calculated if more than one sampling data within the sampling time resolution (see last 2 lines in t1.txt).

$ paste t1.txt t2.txt t3.txt
20080725101223:14       20080725083225:23       20080725083225:32
20080725102225:15       20080725084230:21       20080725084230:32
20080725103227:15       20080725085228:29       20080725085228:34
20080725104233:19       20080725090235:29       20080725090235:36
20080725105235:18       20080725091233:28       20080725091233:37
20080725110236:11       20080725092232:27       20080725092232:37
20080725111237:12       20080725093225:27       20080725093225:37
20080725112231:12       20080725094229:28       20080725094229:33
20080725113241:13       20080725095225:26       20080725095225:32
20080725114236:14       20080725100221:21       20080725100221:34
20080725115241:14       20080725115241:21       20080725101223:33
20080725120235:17       20080725120235:21       20080725102225:33
20080725121231:19       20080725121231:22       20080725103227:33
20080725122232:12       20080725122232:23       20080725104233:37
20080725123238:13       20080725123238:23       20080725105235:37
20080725124237:14       20080725124237:23       20080725110236:37
20080725125237:15       20080725125237:22       20080725111237:39
20080725130239:16       20080725130239:22       20080725112231:34
20080725131233:16       20080725131233:22
20080725132226:14       20080725132226:22
20080725132236:99

$ ./merge2.py t1.txt t2.txt t3.txt
20080725T083200:0:23:32
20080725T084200:0:21:32
20080725T085200:0:29:34
20080725T090200:0:29:36
20080725T091200:0:28:37
20080725T092200:0:27:37
20080725T093200:0:27:37
20080725T094200:0:28:33
20080725T095200:0:26:32
20080725T100200:0:21:34
20080725T101200:14:0:33
20080725T102200:15:0:33
20080725T103200:15:0:33
20080725T104200:19:0:37
20080725T105200:18:0:37
20080725T110200:11:0:37
20080725T111200:12:0:39
20080725T112200:12:0:34
20080725T113200:13:0:0
20080725T114200:14:0:0
20080725T115200:14:21:0
20080725T120200:17:21:0
20080725T121200:19:22:0
20080725T122200:12:23:0
20080725T123200:13:23:0
20080725T124200:14:23:0
20080725T125200:15:22:0
20080725T130200:16:22:0
20080725T131200:16:22:0
20080725T132200:56.5:22:0

Here is my merge2.py code:

#! /usr/bin/python
#
# merge files together based on round-off timestamp
# no assumption taken regarding having same timestamp for each corresponding row
# data file format: yyyymmddHHMMSS:value
#


import sys, os
import datetime, time
import sets


if len(sys.argv) < 2:
    sys.stderr.write("Usage: %s file1 file2 [file ...]" % (sys.argv[0]))
    sys.exit(1)



def avg(n):
    if len(n)==0:
        return 0
    else:
        return float(sum(n,0.0))/len(n)


resolution = 60


#
# tset - stores the time (round off based on resolution) in epoch
# data - dict with key=[filename,epoch], value=list of values
#
tset = sets.Set()
data = {}
outfmt = '%s'
for f in sys.argv[1:]:
    outfmt = '%s:%s' % (outfmt, '%g')
    for line in open(f):
        ts,value = line.strip().split(':')

        yr  = int(ts[0:4])
        mth = int(ts[4:6])
        day = int(ts[6:8])
        hr  = int(ts[8:10])
        min = int(ts[10:12])
        sec = int(ts[12:14])

        t = datetime.datetime(yr, mth, day, hr, min, sec)
        epoch = int(time.mktime(t.timetuple()))
        epoch = epoch/resolution*resolution
        tset.add(epoch)
        if (f,epoch) not in data:
            data[f,epoch]=[]
        data[f,epoch].append(int(value))


for t in sorted(tset):
    tt = time.localtime(t)
    ts = "%04d%02d%02dT%02d%02d%02d" % (tt[0:6])
    lvalue = [ts]

    for f in sys.argv[1:]:
        if (f,t) in data:
            lvalue.append( avg(data[f,t]) )
        else:
            lvalue.append(0)
    print outfmt % tuple(lvalue)

Below plot is created based on my generic plotting tool. BTW, if you do not need the merged output as text, you can simply populate the data into the RRDtool. IBM developerWorks has a very nice article on RRDtool - Expose Web performance problems with the RRDtool

Labels: ,

Thursday, September 25, 2008

For The Father-to-be

Going to be a father and worry that you may not have enough time to catch up with the IT technologies. Here is a tip from me: Invest in quality books (those from O'Reilly are worth buying) so that you can read it anytime, anywhere. Can you see I had my lex and yacc book lying on the desk. BTW, the photo was taken 10+ years ago.









Paste a Few Files Into One and Plot ...

Suppose you have been monitoring a few parameters of a service periodically and output them in separate files. Each file format is colon separated with key value pair, and the key is in the timestamp YYYYmmddHHMMSS format. Now you are required to plot a few of these parameters together in a single plot.

Example:
File 1 Line 1 = 20080910111213:14
File 2 Line 1 = 20080910111213:11
File 3 Line 1 = 20080910111213:23
and you are required to put them together in one line so that you can reuse your generic plotting tool to visualise all these values in a single graph.
Output Line 1 = 20080910111213:14:11:23

Couple of assumption in this work:

  • data in each file is colon separated with
    1st field=time stamp format in YYYYmmddHHMMSS
    2nd field=value
  • all files are having the same time stamp in each corresponding row
Here is the script:
#! /bin/sh

if [ $# -lt 2 ]; then
 echo "Usage: $0 file file [file ...]"
 exit 1
fi


#
# assumptions:
# 1. assume data in each file is colon separated with
#    1st field: time stamp format in YYYYmmddHHMMSS
#    2nd field: value
#    Eg. 20080909121314:2
# 2. all files are having the same time stamp in each corresponding row
#


prefix=".tmprrd-$$"
sep=":"


#
# modify the time stamp for easy parsing in Tcl (clock scan)
#
count=1
suffix=`echo $count | awk '{printf("%03d",$1)}'`
awk -F"$sep" '{printf("%sT%s:%d\n",substr($1,1,8),substr($1,9,6),$2)}' $1 > \
 ${prefix}-${suffix}
count=`expr $count + 1`


#
# for each file (starting from 2nd), extract 2nd field to individual file
#
shift
for i in $@
do
 suffix=`echo $count | awk '{printf("%03d",$1)}'`
 awk -F"$sep" '{print $2}' $i > ${prefix}-${suffix}
 count=`expr $count + 1`
done


#
# 'paste' them together
#
paste -d "$sep" ${prefix}-*


#
# cleanup
#
rm -f ${prefix}-*

As you can see, I modified the time stamp from the first file and stored it in some temporary file with a suffix of "*-001". The time stamp has been reformatted to an acceptable format by Tcl clock scan because the generic plotting tool is implemented in Tcl. As for the values in each file (starting from the 2nd file 'cos the first one has been taken care of), I output them separately with suffix having 3 digit (zero padded) running number. This allows me to take advantage of shell wild card to ensure the sequence of the values corresponds to the sequence of the input files. With this, I can just paste ${prefix}-*.

The output is a beautiful graph.

Labels: , , ,

Monday, September 22, 2008

In Praise of Scripting: Real Programming Pragmatism

IEE Computer Society's July 2008 edition has published an article - In Praise of Scripting: Real Programming Pragmatism (you need to pay $19 to IEEE to view the original article, alternatively you can click here to view the draft). In the abstract, the author recommends that scripting, not Java, be taught first, asserting that students should learn to love their own possibilities before they learn to loathe other people's restrictions.

Eric Wendelin, a Software Engineer at Sun Microsystems mentioned in his blog: What I wanted to know before I left college: A programmer reflects. As he pointed out:

" ... Summer internship that forced me to use computing languages that I had not touched before: Perl, PHP, and other CL tools in a mostly command-line Linux environment - Going outside your comfort zone ended up being HUGE in my career because I realized how to pick up technologies and try to build something useful with them. ..."

I totally agreed that computer science students should be exposed to various type of programming langauges / operating systems during their undergraduate study. BTW, if a language or an operating system can survive that long (UNIX was first developed in 1969, Bourne shell was released in 1977, AWK was written in 1977, Perl / Python / Tcl were developed in the late 1980s), sure it has its own niche area.

Labels: ,

Thursday, September 18, 2008

Beauty of UNIX Sub-Shell

UNIX sub-shell "(...)" is extremely useful in a couple of situations:
  1. group the output of a few commands and have it to be processed by another command
    (cat /etc/hosts; uname -a; df -k) | wc -l
  2. execute a command in a different directory without having to 'cd' back to your original directory
    (cd /some/other/directory; ./run-a-command > output.txt)
  3. run a group of commands in background
    (run1.sh; run2.sh; run3.sh) &

By applying some of these principles, you can do quite a few amusing things. Do you know that you can combine sub-shell and 'tar' to copy directory recursively:

$ (cd /usr/local; tar cpf - .) | (cd $HOME/local; tar xpf -)

Suppose you need to 'talk SMTP' protocol with the mail server directly:

$ telnet mail.some.mail.server.com 25
Trying 1.2.3.4...
Connected to mail.some.mail.server.com.
Escape character is '^]'.
220-mailsrv.some.mail.server.com -- Server ESMTP (SomeServer Mail Service)
220 Authorised Use Only
HELO chihungchan.blogspot.com
250 mailsrv.some.mail.server.com OK, [203.166.139.133].
MAIL FROM: chihung@some.mail.server.com
250 2.5.0 Address Ok.
RCPT TO: chihung@some.mail.server.com
250 2.1.5 chihung@some.mail.server.com OK.
DATA
354 Please start mail input.
From: chihung@some.mail.server.com
To: chihung@some.mail.server.com
Subject: sub shell

testing sub shell
.
250 Mail queued for delivery.
QUIT
221 Closing connection. Good bye.
Connection closed by foreign host.

You may think that by 'echo' all these input commands and have it piped to the remote smtp port will work, think again. When you echo something, the system immediate output the text to the standard output and finish off the command. However, telnet to some.mail.server.com may take a while. First it requires to resolve the hostname to IP address, follow by the establishing TCP connection to the remote server. All these may take more than a second depending on how busy is your DNS server, how far is the remote host and how busy is the network. So by the time some.mail.server.com is available to talk SMTP, the data is gone. To avoid such situation, you can echo the SMTP commands and sleep for a while in a sub-shell while telnet is trying to establish the connection.

Below will NOT be able to send email because of the above reason:

$ echo "HELO chihungchan.blogspot.com
MAIL FROM: chihung@some.mail.server.com
RCPT TO: chihung@some.mail.server.com
DATA
From: chihung@some.mail.server.com
To: chihung@some.mail.server.com
Subject: sub shell

testing sub shell
.
QUIT" | telnet mail.some.mail.server.com 25
Trying 1.2.3.4...
Connected to mail.some.mail.server.com.
Escape character is '^]'.
Connection closed by foreign host.

This will work by applying the sub-shell trick:

$ (echo "HELO chihungchan.blogspot.com
MAIL FROM: chihung@some.mail.server.com
RCPT TO: chihung@some.mail.server.com
DATA
From: chihung@some.mail.server.com
To: chihung@some.mail.server.com
Subject: sub shell

testing sub shell
.
QUIT"; sleep 5) | telnet mail.some.mail.server.com 25
Trying 1.2.3.4...
Connected to mail.some.mail.server.com.
Escape character is '^]'.
220-mailsrv.some.mail.server.com -- Server ESMTP (SomeServer Mail Service)
220 Authorised Use Only
250 mailsrv.some.mail.server.com OK, [1.2.3.4].
250 2.5.0 Address Ok.
250 2.1.5 chihung@some.mail.server.com OK.
354 Please start mail input.
250 Mail queued for delivery.
221 Closing connection. Good bye.
Connection closed by foreign host.

You can do the same trick for HTTP protocol.

$ (echo "GET / HTTP/1.1
Host: chihungchan.blogspot.com
"; sleep 5) | telnet chihungchan.blogspot.com 80 > /tmp/aa 2>&1

$ head /tmp/aa
Trying 209.85.133.191...
Connected to chihungchan.blogspot.com.
Escape character is '^]'.
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Last-Modified: Mon, 15 Sep 2008 02:19:49 GMT
Cache-Control: max-age=0 private
ETag: "05983cb0-d1a4-47e0-bce1-37b9f2e233db"
Transfer-Encoding: chunked
Date: Thu, 18 Sep 2008 09:23:29 GMT
Server: GFE/1.3

58a
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/x
html1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
  <title>Chi Hung Chan</title>

<script src="http://www.google-analytics.com/urchin.js" type="text/javascript">

If you are running Linux, you can explore the use of nc command. Doing the 'nc' way as suggested in the man page does not work on my office mail server. It gives "220 Authorised Use Only" error message. If you know why, appreciate if you can leave me a comment. BTW, Solaris does not have nc command installed.

Labels: ,

Tuesday, September 09, 2008

Memory Utilisation In Linux, Part 2

Wanna see the symptom of memory shortage ? Here is a Python script to 'swallow' 10MB of memory every second:
#! /usr/bin/python

import time

a=[]
while 1:
        a.append('a'*10*1024*1024)
        time.sleep(1)
While the above script is running in the background, here is the output from sar -r 5 100:
# sar -r 5 100
Linux 2.6.18-8.1.8.el5 (chihung)        09/09/2008

07:14:29 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
07:14:34 PM     56396   2019244     97.28    128784   1545336   2031524        84      0.00         0
07:14:39 PM     54876   2020764     97.36    128608   1495564   2031524        84      0.00         0
07:14:44 PM     54704   2020936     97.36    127972   1445768   2031524        84      0.00         0
07:14:49 PM     55540   2020100     97.32    127496   1394408   2031524        84      0.00         0
07:14:54 PM     55600   2020040     97.32    126932   1343752   2031524        84      0.00         0
07:14:59 PM     54452   2021188     97.38    126624   1294228   2031524        84      0.00         0
07:15:04 PM     53116   2022524     97.44    126304   1245076   2031524        84      0.00         0
07:15:09 PM     54484   2021156     97.38    125884   1193316   2031524        84      0.00         0

07:15:09 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
07:15:14 PM     52784   2022856     97.46    125300   1144816   2031524        84      0.00         0
07:15:19 PM     54800   2020840     97.36    124684   1092436   2031524        84      0.00         0
07:15:24 PM     54920   2020720     97.35    124200   1052136   2031524        84      0.00         0
07:15:29 PM     51652   2023988     97.51    123724   1004592   2031524        84      0.00         0
07:15:34 PM     52280   2023360     97.48    123368    964236   2031524        84      0.00         0
07:15:39 PM     54624   2021016     97.37    122944    912644   2031524        84      0.00         0
07:15:44 PM     53756   2021884     97.41    122576    874024   2031524        84      0.00         0

07:15:44 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
07:15:49 PM     52632   2023008     97.46    122440    825380   2031524        84      0.00         0
07:15:54 PM     53488   2022152     97.42    122288    775016   2031524        84      0.00         0
07:15:59 PM     52632   2023008     97.46    122068    727192   2031524        84      0.00         0
07:16:04 PM     53220   2022420     97.44    121928    678084   2031524        84      0.00         0
07:16:09 PM     54432   2021208     97.38    121792    627572   2031524        84      0.00         0
07:16:14 PM     52196   2023444     97.49    121436    581396   2031524        84      0.00         0
07:16:19 PM     53572   2022068     97.42    121068    531860   2031524        84      0.00         0

07:16:19 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
07:16:24 PM     52192   2023448     97.49    120800    485084   2031524        84      0.00         0
07:16:29 PM     51588   2024052     97.51    109316    448756   2031524        84      0.00         0
07:16:34 PM     52528   2023112     97.47    103308    414500   2031524        84      0.00         0
07:16:39 PM     55024   2020616     97.35     96268    369416   2031524        84      0.00         0
07:16:44 PM     54404   2021236     97.38     87060    330008   2031524        84      0.00         0
07:16:49 PM     51752   2023888     97.51     79648    290004   2031524        84      0.00         0
07:16:54 PM     53224   2022416     97.44     68392    250512   2031524        84      0.00         0
07:16:59 PM     54384   2021256     97.38     59184    208384   2031524        84      0.00         0

07:16:59 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
07:17:04 PM     55544   2020096     97.32     46008    171400   2031524        84      0.00         0
07:17:09 PM     54532   2021108     97.37     28476    141884   2031396       212      0.01       128
07:17:14 PM     54372   2021268     97.38     12612    120860   2030756       852      0.04       128
07:17:19 PM     52628   2023012     97.46      2252     85396   2030116      1492      0.07       128
07:17:24 PM     51716   2023924     97.51       144     43988   2009848     21760      1.07     16244
07:17:29 PM     60688   2014952     97.08       208     44044   1956132     75476      3.72     14236
07:17:34 PM     54932   2020708     97.35       216     44068   1915068    116540      5.74     12644

07:17:34 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
07:17:39 PM     53980   2021660     97.40       216     44160   1878736    152872      7.52      1340
07:17:44 PM     55072   2020568     97.35       212     44208   1823232    208376     10.26     16704
07:17:49 PM     51976   2023664     97.50       220     44276   1779172    252436     12.43     13948
07:17:54 PM     52052   2023588     97.49       228     44176   1734720    296888     14.61      8260
07:17:59 PM     53116   2022524     97.44       236     44104   1671496    360112     17.73     19380
07:18:04 PM     51824   2023816     97.50       268     44012   1628964    402644     19.82     13436
07:18:09 PM     56040   2019600     97.30       276     44396   1557816    473792     23.32     40076

07:18:09 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
07:18:14 PM     53260   2022380     97.43       288     44364   1544164    487444     23.99      6492
07:18:19 PM     52592   2023048     97.47       288     42052   1473880    557728     27.45     27956
07:18:24 PM   1878472    197168      9.50       372     40140   1943788     87820      4.32      2520
07:18:29 PM   1878596    197044      9.49       384     40108   1943788     87820      4.32      2520
07:18:34 PM   1878844    196796      9.48       384     40108   1943788     87820      4.32      2520
07:18:39 PM   1878968    196672      9.48       392     40108   1943788     87820      4.32      2520
07:18:44 PM   1878968    196672      9.48       396     40168   1943788     87820      4.32      2520
07:18:49 PM   1878968    196672      9.48       404     40184   1943788     87820      4.32      2520

I terminated the Python script at 07:18:24. As you can see, although the %memused started at 97%, it is until 2.5 minutes later we can see swap kicked in. That is about about 1.5GB of memory allocated to the Python program. During this 2.5 minute, the system will page out whatever not necessary in the memory cache to give memory space to the Python program. That's why you do not see any memory shortage or swapping for the first two and a half minutes. However, when the Python program still keep on 'swallowing' more memory, the system has no more memory cache to page out. At this stage, the system has to resolve to swapping out memory pages to swap partition (that's will be in disk). As you can see, the 'kbswpused' and '%swpused' keep going up and this is a clear indication of memory shortage.

When we terminate our Python program, all memory previously allocated in the heap will be returned back to the system. During the entire run of the Python program, it 'swallowed' almost 1.8GB of memory and that's why you will see the %memused dropped down to less than 10%.

Labels: ,

Memory Utilisation In Linux

Yesterday one of my customers asked about why Linux memory utilisation is always remain high. In order to convince everyone (including myself), I conducted the below experiment in my Linux box (with 2GB memory). Basically the system is reporting a very high %memused of 96.92% with only 63,916 KBytes of free memory. So the question is: Is this system really running low in memory, Do I need to add more memory, What will be the best indicator for memory shortage?

If you look at the man page of proc (man proc), you can disable caching in the virtual memory:

       /proc/sys/vm/drop_caches (since Linux 2.6.16)
              Writing  to  this  file  causes the kernel to drop clean caches,
              dentries and inodes from memory, causing that memory  to  become
              free.

              To  free  pagecache,  use  echo 1 > /proc/sys/vm/drop_caches; to
              free dentries and inodes, use echo 2 > /proc/sys/vm/drop_caches;
              to   free   pagecache,   dentries  and  inodes,  use  echo  3  >
              /proc/sys/vm/drop_caches.

              Because this is a non-destructive operation  and  dirty  objects
              are not freeable, the user should run sync(8) first.

Step 1: Set the system to drop all cache in VM, sync the system and set it back to enable caching. You will see the %memused dropped from 96.62 to 22.73. Also the kbbuffers/kbcached dropped as well. This will be the baseline for the rest of my tests.

# grep MemTotal /proc/meminfo
MemTotal:      2075640 kB


# sar -r 1
Linux 2.6.18-8.1.8.el5 (chihung)        09/09/2008

03:00:06 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
03:00:07 PM     63916   2011724     96.92    133656   1455556   2031524        84      0.00         0
Average:        63916   2011724     96.92    133656   1455556   2031524        84      0.00         0


# cat /proc/sys/vm/drop_caches
0


# echo 3 > /proc/sys/vm/drop_caches


# sync


# sar -r 1 1
Linux 2.6.18-8.1.8.el5 (chihung)        09/09/2008

03:03:32 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
03:03:33 PM   1603800    471840     22.73       488     49512   2031524        84      0.00         0
Average:      1603800    471840     22.73       488     49512   2031524        84      0.00         0

# echo 0 > drop_caches


# cat drop_caches
0

Step 2: Travserse the / partition; the system will cache all the necessary file information and of course these data has to be stored somewhere in the memory; hence the %memused goes up. Also, you can see that the second invocation of the same 'find' command is extremely fast because it does not have to go back to disk to get the file information (from 94.395 seconds to 1.169 second)

# time find / -mount -print > /dev/null

real    1m34.395s
user    0m0.391s
sys     0m1.256s


# sar -r 1 1
Linux 2.6.18-8.1.8.el5 (chihung)        09/09/2008

03:06:17 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
03:06:18 PM   1476196    599444     28.88    114676     62212   2031524        84      0.00         0
Average:      1476196    599444     28.88    114676     62212   2031524        84      0.00         0


# time find / -print > /dev/null

real    0m1.169s
user    0m0.368s
sys     0m0.802s

Step 3: This is to show that my newly created file (200MB) content is memory mapped to the VM (%memused increased by 10%). Read man mmap for details.

# dd if=/dev/urandom of=/tmp/newfile bs=20k count=10000
10000+0 records in
10000+0 records out
204800000 bytes (205 MB) copied, 58.9314 seconds, 3.5 MB/s


# sar -r 1 2
Linux 2.6.18-8.1.8.el5 (chihung)        09/09/2008

03:08:54 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
03:08:55 PM   1274512    801128     38.60    115148    262336   2031524        84      0.00         0
Average:      1274512    801128     38.60    115148    262336   2031524        84      0.00         0


# rm /tmp/newfile
rm: remove regular file `/tmp/newfile'? y


# sar -r 1 2
Linux 2.6.18-8.1.8.el5 (chihung)        09/09/2008

03:09:54 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
03:09:55 PM   1475024    600616     28.94    115216     62376   2031524        84      0.00         0
Average:      1475024    600616     28.94    115216     62376   2031524        84      0.00         0

Step 4: I tried to work on (doing an MD5) every single file in the /usr/share in the background and monitor (running sar -r) in the foreground. As you can see, I did not ask the system to store any of the file content, it is Linux's default behaviour to 'cache' the content

# du -sh /usr/share
2.2G    /usr/share


# at now
at> find /usr/share -type f -exec md5sum {} \; > /dev/null
at> <EOT>
job 11 at 2008-09-09 15:10


# sar -r 5 100
Linux 2.6.18-8.1.8.el5 (chihung)        09/09/2008

03:10:35 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
03:10:40 PM   1393712    681928     32.85     46092    224460   2031524        84      0.00         0
03:10:45 PM   1380204    695436     33.50     46252    237752   2031524        84      0.00         0
03:10:50 PM   1361832    713808     34.39     46544    255736   2031524        84      0.00         0
03:10:55 PM   1356128    719512     34.66     46624    261420   2031524        84      0.00         0
03:11:00 PM   1315144    760496     36.64     47060    302112   2031524        84      0.00         0
03:11:05 PM   1284696    790944     38.11     47668    331500   2031524        84      0.00         0
03:11:10 PM   1258344    817296     39.38     48084    357852   2031524        84      0.00         0

03:11:10 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
03:11:15 PM   1233656    841984     40.57     48752    381428   2031524        84      0.00         0
03:11:20 PM   1214500    861140     41.49     49196    400216   2031524        84      0.00         0
03:11:25 PM   1177320    898320     43.28     49764    436032   2031524        84      0.00         0
03:11:30 PM   1150792    924848     44.56     50276    462236   2031524        84      0.00         0
03:11:35 PM   1116972    958668     46.19     50768    495060   2031524        84      0.00         0
03:11:40 PM   1098172    977468     47.09     51268    513144   2031524        84      0.00         0
03:11:45 PM   1087184    988456     47.62     51332    523948   2031524        84      0.00         0

03:11:45 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
03:11:50 PM   1071824   1003816     48.36     51508    539348   2031524        84      0.00         0
03:11:55 PM   1038660   1036980     49.96     52020    571084   2031524        84      0.00         0
03:12:00 PM   1009128   1066512     51.38     52476    600048   2031524        84      0.00         0
03:12:05 PM    983304   1092336     52.63     52836    625224   2031524        84      0.00         0
03:12:10 PM    956756   1118884     53.91     53372    650948   2031524        84      0.00         0
03:12:15 PM    940372   1135268     54.69     53744    667116   2031524        84      0.00         0
03:12:20 PM    925048   1150592     55.43     54220    681732   2031524        84      0.00         0

03:12:20 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
03:12:25 PM    913396   1162244     55.99     54648    693080   2031524        84      0.00         0
03:12:30 PM    909124   1166516     56.20     54944    696932   2031524        84      0.00         0
03:12:35 PM    904196   1171444     56.44     55044    701660   2031524        84      0.00         0
03:12:40 PM    898896   1176744     56.69     55328    706216   2031524        84      0.00         0
03:12:45 PM    895216   1180424     56.87     55748    709320   2031524        84      0.00         0
03:12:50 PM    892456   1183184     57.00     55816    712140   2031524        84      0.00         0
03:12:55 PM    889400   1186240     57.15     55844    715308   2031524        84      0.00         0

03:12:55 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
03:13:00 PM    885916   1189724     57.32     55856    718716   2031524        84      0.00         0
03:13:05 PM    882616   1193024     57.48     55872    722148   2031524        84      0.00         0
03:13:10 PM    879376   1196264     57.63     55884    725236   2031524        84      0.00         0
03:13:15 PM    875596   1200044     57.82     55956    728996   2031524        84      0.00         0
03:13:20 PM    870316   1205324     58.07     56136    734288   2031524        84      0.00         0
03:13:25 PM    846048   1229592     59.24     56328    757780   2031524        84      0.00         0
03:13:30 PM    810196   1265444     60.97     57048    792428   2031524        84      0.00         0

03:13:30 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
03:13:35 PM    767804   1307836     63.01     58012    833768   2031524        84      0.00         0
03:13:40 PM    726556   1349084     65.00     58744    874292   2031524        84      0.00         0
03:13:45 PM    694428   1381212     66.54     59116    905580   2031524        84      0.00         0
03:13:50 PM    649688   1425952     68.70     60144    948956   2031524        84      0.00         0
03:13:55 PM    611792   1463848     70.53     60884    985812   2031524        84      0.00         0
03:14:00 PM    568976   1506664     72.59     61592   1027748   2031524        84      0.00         0
03:14:05 PM    540652   1534988     73.95     61992   1054012   2031524        84      0.00         0

03:14:05 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
03:14:10 PM    509420   1566220     75.46     62756   1084120   2031524        84      0.00         0
03:14:15 PM    446928   1628712     78.47     63676   1145888   2031524        84      0.00         0
03:14:20 PM    424768   1650872     79.54     64092   1167432   2031524        84      0.00         0
03:14:25 PM    369460   1706180     82.20     65052   1221272   2031524        84      0.00         0
03:14:30 PM    343220   1732420     83.46     65544   1246748   2031524        84      0.00         0
03:14:35 PM    335232   1740408     83.85     65664   1254504   2031524        84      0.00         0
03:14:40 PM    312048   1763592     84.97     65984   1276860   2031524        84      0.00         0

03:14:40 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
03:14:45 PM    277980   1797660     86.61     66816   1309520   2031524        84      0.00         0
03:14:50 PM    258992   1816648     87.52     67696   1327724   2031524        84      0.00         0
03:14:55 PM    232688   1842952     88.79     68372   1353356   2031524        84      0.00         0
03:15:00 PM    211608   1864032     89.81     68876   1373732   2031524        84      0.00         0
03:15:05 PM    189792   1885848     90.86     69360   1394508   2031524        84      0.00         0
03:15:10 PM    163984   1911656     92.10     69900   1419764   2031524        84      0.00         0
03:15:15 PM    143396   1932244     93.09     70384   1440224   2031524        84      0.00         0

03:15:15 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
03:15:20 PM    126700   1948940     93.90     70776   1456592   2031524        84      0.00         0
03:15:25 PM    101996   1973644     95.09     71372   1479824   2031524        84      0.00         0
03:15:30 PM     70808   2004832     96.59     72040   1510020   2031524        84      0.00         0
03:15:35 PM     51624   2024016     97.51     69880   1530840   2031524        84      0.00         0
03:15:40 PM     52176   2023464     97.49     64064   1536432   2031524        84      0.00         0
03:15:45 PM     53048   2022592     97.44     59980   1540968   2031524        84      0.00         0
03:15:50 PM     52100   2023540     97.49     55668   1546316   2031524        84      0.00         0

03:15:50 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
03:15:55 PM     52396   2023244     97.48     53908   1547744   2031524        84      0.00         0
03:16:00 PM     52264   2023376     97.48     54648   1547500   2031524        84      0.00         0

In this exercise, I did not explicitly consume or allocate memory. It is the default behaviour of Linux that buffer and cache content and meta content. IMHO, you do not have to worry about high %memused, it is the non-zero and constantly rising in kbswpused, %swpused indicate swapping, that means memory shortage.

While I was driving home, I thought I may as well show my audience what swapping is like. See part 2

Labels:

Friday, September 05, 2008

Comparing Timestamp Between Two Files

If you need to compare modified time (mtime) of two files, you can use the shell construct:
if [ "myfile1" -nt "myfile2" ]
However, this is only available in Korn and Bash shells. Also it compares only the modified time. If you are used to programming in Bourne shell, you will have to rely on external utilities like Perl/Python/Tcl or others. If you need to compare creation time (ctime), access time (atime) or even compare local file against remote file, below shell function will definitely help you. The "epoch" function will try to return the epoch atime/ctime/mtime of the input file using either Perl, Python or Tcl. It is highly unlikely that any modern UNIX operating system will not have any of these utilities installed.
epoch()
{
 which perl > /dev/null 2>&1
 if [ $? -eq 0 ]; then
  epoch_in_perl $@
  return $?
 fi
 which python > /dev/null 2>&1
 if [ $? -eq 0 ]; then
  epoch_in_python $@
  return $?
 fi
 which tclsh > /dev/null 2>&1
 if [ $? -eq 0 ]; then
  epoch_in_tclsh $@
  return $?
 fi
}

epoch_in_perl()
{
 if [ $# -ne 2 ]; then
  echo "Usage: epoch atime|mtime|ctime file" 1>&2
  return 1
 fi
 if [ ! -e "$2" ]; then
  echo "Error. \"$2\" does not exist" 1>&2
  return 1
 fi
  
 case "$1" in
 atime)
  perl -e '@a=stat("'$2'");print $a[8]'
  ;;
 mtime)
  perl -e '@a=stat("'$2'");print $a[9]'
  ;;
 ctime)
  perl -e '@a=stat("'$2'");print $a[10]'
  ;;
 *)
  echo "Usage: epoch atime|mtime|ctime file" 1>&2
  return 1
  ;;
 esac
}


epoch_in_python()
{
 if [ $# -ne 2 ]; then
  echo "Usage: epoch atime|mtime|ctime file" 1>&2
  return 1
 fi
 if [ ! -e "$2" ]; then
  echo "Error. \"$2\" does not exist" 1>&2
  return 1
 fi
  
 case "$1" in
 atime)
  python -c 'import os.path;print int(os.path.getatime("'$2'"))'
  ;;
 mtime)
  python -c 'import os.path;print int(os.path.getmtime("'$2'"))'
  ;;
 ctime)
  python -c 'import os.path;print int(os.path.getctime("'$2'"))'
  ;;
 *)
  echo "Usage: epoch atime|mtime|ctime file" 1>&2
  return 1
  ;;
 esac
}

epoch_in_tclsh()
{
 if [ $# -ne 2 ]; then
  echo "Usage: epoch atime|mtime|ctime file" 1>&2
  return 1
 fi
 if [ ! -e "$2" ]; then
  echo "Error. \"$2\" does not exist" 1>&2
  return 1
 fi
  
 case "$1" in
 atime)
  echo "file stat \"$2\" a;puts [set a(atime)]" | tclsh
  ;;
 mtime)
  echo "file stat \"$2\" a;puts [set a(mtime)]" | tclsh
  ;;
 ctime)
  echo "file stat \"$2\" a;puts [set a(ctime)]" | tclsh
  ;;
 *)
  echo "Usage: epoch atime|mtime|ctime file" 1>&2
  return 1
  ;;
 esac
}

Here is how you can use this function:

$ epoch mtime myfile1
1201206566

$ epoch atime myfile2
1220593896

$ if [ `epoch ctime myfile1` -lt `epoch ctime myfile2 ]; then
    dosomething
  fi

$ if [ `epoch mtime myfile1` -gt `ssh chihung@remote epoch myfile1` ]; then
    dosomething
  fi

Labels: , , ,