An Old Task In Python, Take 2
I am going to tackle this
old task based on
Python using quite a few modules. This time I will be saving the unique session id, start and end time in the
SQLite database. Also, I will be parsing a couple of GZIP access logs without having to GUNZIPing them.
In this exercise, the power of Python is in their 'battery-included' modules. The down side is, even simple 'cd' to change directory you need to 'import os' then os.chdir('some/dir/rect/ory'). Below shows how I populate the web access log to the SQLite database
import gzip
import os
import sqlite3
import time
import glob
def processLine(conn,line):
cursor=conn.cursor()
ts=time.strptime(line.split()[3][1:],'%d/%b/%Y:%H:%M:%S')
epoch=time.mktime(ts)
try:
# locate the unique session id
p1=line.index('jsessionid=')
p2=line.index('?',p1)
sess=line[p1+11:p2]
except:
return
try:
# throw exception if sessions id does not exist
# epoch will be start time, else end time
a=cursor.execute("select * from sessions where id='%s'" % sess)
a.next()
cursor.execute("update sessions set end = %d where id = '%s'" % (epoch,sess))
except:
cursor.execute("insert into sessions values ('%s',%d,%d)" % (sess,epoch,epoch))
if count % 500 == 0:
print 'commit',
conn.commit()
dbfile='ndp.db'
conn=sqlite3.connect(dbfile)
try:
conn.cursor().execute('create table sessions (id text, start int, end int)')
except:
pass
count=1;
for gz in glob.glob('*.gz'):
for line in gzip.open(gz,'r'):
count=count+1
processLine(conn,line)
Labels: python


0 Comments:
Post a Comment
<< Home