Thursday, August 07, 2008

An Old Task In Python, Take 2

I am going to tackle this old task based on Python using quite a few modules. This time I will be saving the unique session id, start and end time in the SQLite database. Also, I will be parsing a couple of GZIP access logs without having to GUNZIPing them.

In this exercise, the power of Python is in their 'battery-included' modules. The down side is, even simple 'cd' to change directory you need to 'import os' then os.chdir('some/dir/rect/ory'). Below shows how I populate the web access log to the SQLite database

import gzip
import os
import sqlite3
import time
import glob



def processLine(conn,line):
 cursor=conn.cursor()
 ts=time.strptime(line.split()[3][1:],'%d/%b/%Y:%H:%M:%S')
 epoch=time.mktime(ts)
 
 try:
  # locate the unique session id
  p1=line.index('jsessionid=')
  p2=line.index('?',p1)
  sess=line[p1+11:p2]
 except:
  return
  
 try:
  # throw exception if sessions id does not exist
  # epoch will be start time, else end time
  a=cursor.execute("select * from sessions where id='%s'" % sess)
  a.next()
  cursor.execute("update sessions set end = %d where id = '%s'" % (epoch,sess))
 except:
  cursor.execute("insert into sessions values ('%s',%d,%d)" % (sess,epoch,epoch))

 if count % 500 == 0:
  print 'commit',
  conn.commit()



dbfile='ndp.db'
conn=sqlite3.connect(dbfile)
try:
 conn.cursor().execute('create table sessions (id text, start int, end int)')
except:
 pass
 
count=1;
for gz in glob.glob('*.gz'):
 for line in gzip.open(gz,'r'):
  count=count+1
  processLine(conn,line)

Labels:

0 Comments:

Post a Comment

<< Home