Tuesday, November 18, 2008

Manipulate lots of .tar.gz files

If you need to manipulate a lot of .tar.gz files and you do not want to uncompress & untar them out, you may want to use Python to script it. With Python 'battery included' modules and libraries, you almost do not have to re-invent the wheel.

The tarfile module allows you to work with 'tar' file with ease. It even let you work with 'tar' file in either gzip or bzip2 compression mode.

Recently I need to extract information from a lot of Sun's SUNWexplo explorer files in .tar.gz format. Suppose I have a few explorer files and I need to print out the disk error line (every 5th line in "iostat -E" output from the file "explorer.hostid.machine-yyyy.mm.dd.hh.min/disks/iostat_-E.out"

$ ls *.tar.gz
explorer.12345678.machine1-2008.10.11.13.03.tar.gz
explorer.90abcdef.machine2-2008.10.04.19.04.tar.gz
explorer.98765432.machine3-2008.11.15.09.05.tar.gz

$ for i in *.tar.gz; do gunzip < $i | tar tvf - | grep iostat_-E.out; done<
-rw-r--r-- root/root      6428 2008-10-11 13:03 explorer.12345678.machine1-2008.10.11.13.03.tar.gz/disks/iostat_-E.out
-rw-r--r-- root/root      8334 2008-10-14 19:04 explorer.90abcdef.machine2-2008.10.04.19.04.tar.gz/disks/iostat_-E.out
-rw-r--r-- root/root     12864 2008-11-15 09:05 explorer.98765432.machine3-2008.11.15.09.05.tar.gz/disks/iostat_-E.out
Here is the code snippet that you may want to try out yourself.
import tarfile
import glob
import os

for explo in glob.glob("*.tar.gz"):
    targz=tarfile.open(explo, 'r:gz')
    iostat=targz.extractfile('%s/disks/iostat_-E.out' % os.path.basename(explo)[0:-7])
    count=0
    for line in iostat:
        if count%5 == 0:
            line=line.rstrip()
            print line
        count+=1
    
    iostat.close()
    targz.close()

You may want expand the above code so that you can alert the sys admin folk if the number of soft/hard/transport errors exceed certain threshold.

Labels:

0 Comments:

Post a Comment

<< Home