An Old Task In Python, Take 2
I am going to tackle this
old task based on
Python using quite a few modules. This time I will be saving the unique session id, start and end time in the
SQLite database. Also, I will be parsing a couple of GZIP access logs without having to GUNZIPing them.
In this exercise, the power of Python is in their 'battery-included' modules. The down side is, even simple 'cd' to change directory you need to 'import os' then os.chdir('some/dir/rect/ory'). Below shows how I populate the web access log to the SQLite database
import gzip import os import sqlite3 import time import glob def processLine(conn,line): cursor=conn.cursor() ts=time.strptime(line.split()[3][1:],'%d/%b/%Y:%H:%M:%S') epoch=time.mktime(ts) try: # locate the unique session id p1=line.index('jsessionid=') p2=line.index('?',p1) sess=line[p1+11:p2] except: return try: # throw exception if sessions id does not exist # epoch will be start time, else end time a=cursor.execute("select * from sessions where id='%s'" % sess) a.next() cursor.execute("update sessions set end = %d where id = '%s'" % (epoch,sess)) except: cursor.execute("insert into sessions values ('%s',%d,%d)" % (sess,epoch,epoch)) if count % 500 == 0: print 'commit', conn.commit() dbfile='ndp.db' conn=sqlite3.connect(dbfile) try: conn.cursor().execute('create table sessions (id text, start int, end int)') except: pass count=1; for gz in glob.glob('*.gz'): for line in gzip.open(gz,'r'): count=count+1 processLine(conn,line)
Labels: python
0 Comments:
Post a Comment
<< Home