Chi Hung Chan: October 2007

Friday, October 26, 2007

OpenSolaris and Tcl

Just installed OpenSolaris Build 75 and booted it up with xVM (Xen). I tried Brandz before with CentOS 3 with OpenSolaris Build 59, that was quite some time ago. However, I was not very pleased with Brandz running an old Linux kernel 2.4.

Anyway, I need to dig into Xen and tried to understand all the ins and outs. Stay tune, for sure I will blog about my findings.

One of the findings is that Tcl is in the main stream. Also, Tcl binding for PostgreSQL is fully integrated too.

In Solaris 10, it used to be in /usr/sfw (Sunfreeware) directory and is running a pretty old version Tcl 8.3.3.

# pkginfo | grep -i tcl
system      SUNWTcl                          Tcl - Tool Command Language
system      SUNWTk                           Tk - TCL GUI Toolkit
system      SUNWpostgr-82-pl                 PostgreSQL 8.2 additional Perl, Python & TCL server procedural languages
system      SUNWpostgr-82-tcl                Tcl binding library for PostgreSQL 8.2
system      SUNWpostgr-tcl                   A Tcl client library for PostgreSQL

# pkginfo -l SUNWTcl
   PKGINST:  SUNWTcl
      NAME:  Tcl - Tool Command Language
  CATEGORY:  system
      ARCH:  i386
   VERSION:  11.11.0,REV=2007.10.02.00.53
   BASEDIR:  /
    VENDOR:  Sun Microsystems, Inc.
      DESC:  Tcl - Tool Command Language (8.4.14)
    PSTAMP:  sfwnv-x20071002010749
  INSTDATE:  Oct 26 2007 13:40
   HOTLINE:  Please contact your local service provider
    STATUS:  completely installed
     FILES:      736 installed pathnames
                   9 shared pathnames
                  17 directories
                   3 executables
               10154 blocks used (approx)

Labels: opensolaris, Tcl

Wednesday, October 17, 2007

Table^10

If you watch TV often enough, you will probably come across this ad about the new web site, Mocca.com (MediaCorp Online Communities and Classified Advertising).

The above site looks pretty good on my Firefox with a response time of about 3 seconds for 121 requests (413KB in total). The result is based on the Firefox Add-on, Firebug. 100+ requests is considered quite a lot.

What I normally do when I find the site "interesting" is to look at the HTML code. Guess what, I realised that there is a lot issues with the HTML source.

The redirection is not done properly. Normally it will be redirected to the location via the Location or Content-Location header to avoid additional overhead from browser to render the HTML code. Also, the HTML is definitely ill-formed and not complete.

$ curl --dump-header /dev/tty http://mocca.com/
HTTP/1.1 200 OK
Content-Length: 61
Content-Type: text/html
Content-Location: http://mocca.com/index.htm
Last-Modified: Wed, 26 Sep 2007 09:30:19 GMT
Accept-Ranges: bytes
ETag: "b8a39eda1f0c81:17de"
X-Powered-By: ASP.NET
Date: Tue, 16 Oct 2007 08:46:21 GMT
Server: Concealed by Juniper Networks DX
Via: 1.1 MC-LB1 (Juniper Networks Application Acceleration Platform - DX 5.2.5 0)
Set-Cookie: rl-sticky-key=caacb80750; path=/; expires=Tue, 16 Oct 2007 09:31:03GMT

<META HTTP-EQUIV="Refresh" CONTENT="1; URL=/portal/site/cas">

When I fetched the home page http://mocca.com/portal/site/cas, I realised that the HTML code has the <title> node before the <html> node. If you were to convert the HTML to a DOM tree, you may likely get 1 node in the tree. Also, I submitted the URL to W3C HTML Validation Service, and it reported 218 Errors.

$ curl --silent http://mocca.com/portal/site/cas 2>&1 | awk '$0!~/^$/{print}' | head -10

<title>MediaCorp Mocca </title>
<html xmlns="http://www.w3.org/1999/xhtml">
        <head>

<link href="/vgn-ext-templating/common/styles/vgn-ext-templating.css" rel="stylesheet" type="text/css"></link>
          <script language="JavaScript" src="/portal/jslib/form_state_manager.js"></script>
          <noscript>In order to bring you the best possible user experience, this site uses Javascript. If you are seeing this message, it is likely that the Javascript option in your browser is disabled. For optimal viewing of this site, p
lease ensure that Javascript is enabled for your browser.
          </noscript>
                <base target="_top">

I realised there is a lot of table within table within table. Hey, that's interesting. At the back of my mind I was wondering how deep is this nesting of tables going to be. A Tcl program with tDOM extension should do the job. Of course I have to remove the first <title> before the parsing, otherwise I will end up with 1 node in the tree.

package require tdom

proc howdeep { node } {
 set l [split [$node toXPath] /]
 set n [llength $l]
 set count 0
 for { set i 0 } { $i < $n } { incr i } {
  if { [string match -nocase "table*" [lindex $l $i]] } {
   incr count
  }
 }
 return $count
}

set html index.html-modified
set doc [dom parse -html [tDOM::xmlReadFile $html]]
set root [$doc documentElement]

set max 0
set maxnode {}
foreach table [$root selectNodes {//table}] {
 set level [howdeep $table]
 if { $level > $max } {
  set max $level
  set maxnode $table
 }
}
puts $max
puts [llength [$root selectNodes {//table}]]
puts [$maxnode toXPath]

Output of this program:

10
144
/html/body/table/tr[2]/td[2]/table/tr[2]/td/table/tr/td[3]/table/tr/td/table/tr/td/table[1]/tr/td[1]/table/tr[2]/td[2]/table/tr/td[1]/table/tr[1]/td/table

Wow, we are talking about a nesting of 10 levels of <table> and a total of 144 tables, that's a lot! So, what's the conclusion.

Modern browsers are so forgiving and they normally do a very good job to render complex and even ill-formed HTML code
With such a deep nesting of tables and 100+ requests, the browser is able to render the content in seconds. That is amusing.

Labels: html, Tcl, tDOM, xpath

Monday, October 15, 2007

Freemind for visualisation

In my previous blog on Firewall Navigation, I promised to explore the use of Freemind to visualise the firewall rules. Let me give you a glimpse of it, but I can tell you that my web version is still the best.

If you model (mind map) something in freemind, you will get a XML file when you save your mind map. Below shows a sample of the XML file:

<map version="0.8.0">
<node TEXT="pixfirewall">
<node TEXT="name" FOLDED="true">
<node ID="host11" TEXT="host1&#xa;10.0.1.1"/>
<node ID="host12" TEXT="host1&#xa;10.0.1.2"/>
<node ID="host13" TEXT="host1&#xa;10.0.1.3"/>
<node ID="host14" TEXT="host1&#xa;10.0.1.4"/>
<node TEXT="access-group" POSITION="left" FOLDED="true">
<node TEXT="DC_PROD_acl">
<arrowlink COLOR="#b2a3e3" DESTINATION="Interface1" ENDARROW="None"/>
</node>
<node TEXT="KIO_UAT_acl">
<arrowlink COLOR="#72f6c1" DESTINATION="Interface2" ENDARROW="None"/>
</node>
</node>
<node TEXT="object-group" POSITION="left" FOLDED="true">
<node ID="Public" TEXT="(port)Media_Port">
<cloud/>
<node TEXT="88"/>
<node TEXT="netbios-ns-netbios-dgm"/>
<node TEXT="389"/>
<node TEXT="domain"/>
</node>
</node>
</map>

I wrote a Tcl program to convert some of the Cisco firewall rules to Freemind XML nodes. Also, I tried to introduce some relationships between nodes. Anyway, for 12,000+ lines of firewall rules, it is going to be very very messy and very hard to nagivate. Also, it will take up a lot of memory to visual the model.

Here is a simplified view of the model:

Labels: Tcl, XML

Removing Files

Sometime you may encounter problems in removing (rm) files in a UNIX environment. It could be due to the file name consists of non-printable characters (such as space, Ctrl-C, ...) or the file name starts with a minus ("-")

Lets tackle the easy one, that is file name with a "-". If you "rm" directory on the file, the "rm" command will complaint illegal option. What you can do is either provide a relative path "./" or full path to avoid having the first character to be the minus sign.

$ touch ./-abc
$ rm -abc
rm: illegal option -- a
rm: illegal option -- b
rm: illegal option -- c
usage: rm [-fiRr] file ...
$ rm ./-abc

So far so good. How about files with non-printable characters. Let's create some of these files. To introduce control character, type in Ctrl-V followed by whatever control character.

$ touch "abc def"
$ touch "hij^Clkm"
$ touch "rst^Mxyz"
$ ls
xyz def  hijlkm  rst

If you really want to see the actual character set of these file names, you can do an "octal dump" on the output of "ls -1" (ls minus one). i.e., one file name per line.

$ ls -1 | od -c
0000000   a   b   c       d   e   f  \n   h   i   j 003   l   k   m  \n
0000020   r   s   t  \r   x   y   z  \n
0000030

If you have problem deleting any of the file, what you can do is to find out the i-node number of that file and delete it using find

$ ls -li
total 0
    297552 -rw-------   1 chihung  gdz            0 Oct 15 20:18 abc def
    297551 -rw-------   1 chihung  gdz            0 Oct 15 20:18 hijlkm
xyz 297550 -rw-------   1 chihung  gdz            0 Oct 15 20:18 rst

$ find . -inum 297550 -exec rm -i {} \;
xyz (yes/no)? yes

$ ls -li
total 0
    297552 -rw-------   1 chihung  gdz            0 Oct 15 20:18 abc def
    297551 -rw-------   1 chihung  gdz            0 Oct 15 20:18 hijlkm

Remember to use "rm -i" in "find", just in case.

Labels: shell script

find find

In a recently performance analysis, I need to find out how many files in each of the directories/sub-directories to determine whether there is any heavily populated files in any of the directory.

What I need is to run find within find. Also I need to make sure the find will not traverse into any of the sub-directory. In Linux, you can specific "-maxdepth 1" to limit your depth.

First I need to write a script to find the number of files (no directory) in a given directory.

$ cat /tmp/find_f.sh
#! /bin/sh

n=`find $1 -maxdepth 1 -type f | awk 'END{print NR}'`
echo $1 - $n

Once that is done, I will have to run the above script for every directories/sub-directories. To find out the most populated directory, I simply pipe that to "sort" with -n for numeric sort.

$ find /usr/include -type d -exec /tmp/find_f.sh {} \; | sort -n -k 3 | tail
/usr/include/c++/4.1.1/java/awt - 104
/usr/include/evolution-data-server-1.8/camel - 112
/usr/include/c++/4.1.1/javax/swing - 124
/usr/include/c++/4.1.1/gnu/java/locale - 141
/usr/include/kde/dom - 179
/usr/include/boost/mpl - 182
/usr/include/gtk-2.0/gtk - 195
/usr/include - 239
/usr/include/linux - 331
/usr/include/kde - 456

Labels: shell script

Saturday, October 06, 2007

Add Contacts in Webmail

I went to the recent Sun's Project Blackbox launch the other day and met my marcom colleague on my way to the train station. We had a nice chat and exchanged a lot of ideas regarding how our internal MIS can be improved. One of the things that bothers them for a long time is email. As marcom, they need a system to blast off emails to our customers regarding events, news, and anything that we want them to know about us. MIS told them they do not have such a system available and therefore marcom has to do this using their existing Outlook.

The current customer database is in a Excel spreadsheet and the marcom folks have to literally copy and paste them one by one into the marcom Outlook a/c in the desktop pc. I am sure there is a better way in doing this, automation is the way to go. I explored both using Outlook as well as the web-based email. With a bit of Googling, it is possible to export the MS Excel to CSV (comma separated variable) and have the CSV file imported into the Outlook. It would be easier to introduce a header in the first row so that the mapping of your Excel data to Outlook contact can be done exactly.

What happen if the person is on leave and the other marcom folks do not have access to the desktop. AFAIK, the Outlook contacts are stored in your local desktop, not in the server. BTW, we are running Sun Messaging Server, not MS Exchange. It would be nice to import all these email addresses into the system so that it can be access anywhere by anyone. However, such functionality is not available in the system.

With many years of Web Scraping experience, it will not be too hard to do that, right? First, I installed LiveHTTPheaders extension in FireFox. Second, I login to the mail server and add a contact to the system. I picked a particular syntax (like this _XXXXX_) for all the values in the fields. This syntax enables me to easily identify the value in the encoded string in the POST data.

I break them into lines for ease of reading

sid=&security=false&ldapurl=pab&filter=cmd%3DPAB_CMD_ADD_ENTRY%7Cnew%3Dgivenname%3A_FIRSTNAME_%0D%0A
sn%3A_LASTNAME_%0D%0Acn%3A_DISPLAY_%0D%0Aobjectclass%3Apabperson%0D%0Amemberofpab%3AAddressBookc6e5131%0D%0A
mail%3A_EMAIL_%0D%0Atelephonenumber%3A_OFFICE_%0D%0Ahomephone%3A_HOME_%0D%0Amobile%3A_MOBILE_%0D%0A
pager%3A_PAGER_%0D%0Afacsimiletelephonenumber%3A_FAX_%0D%0A%0D%0A%7C

If you do a "man ascii" in solaris, you will see the hex character encoding

                          Hexadecimal - Character

     00 NUL   01 SOH   02 STX   03 ETX   04 EOT   05 ENQ   06 ACK  07 BEL
     08 BS    09 HT    0A NL    0B VT    0C NP    0D CR    0E SO   0F SI
     10 DLE   11 DC1   12 DC2   13 DC3   14 DC4   15 NAK   16 SYN  17 ETB
     18 CAN   19 EM    1A SUB   1B ESC   1C FS    1D GS    1E RS   1F US
     20 SP    21 !     22 "     23 #     24 $     25 %     26 &    27 '
     28 (     29 )     2A *     2B +     2C ,     2D -     2E .    2F /
     30 0     31 1     32 2     33 3     34 4     35 5     36 6    37 7
     38 8     39 9     3A :     3B ;     3C <     3D =     3E >    3F ?
     40 @     41 A     42 B     43 C     44 D     45 E     46 F    47 G
     48 H     49 I     4A J     4B K     4C L     4D M     4E N    4F O
     50 P     51 Q     52 R     53 S     54 T     55 U     56 V    57 W
     58 X     59 Y     5A Z     5B [     5C \     5D ]     5E ^    5F _
     60 `     61 a     62 b     63 c     64 d     65 e     66 f    67 g
     68 h     69 i     6A j     6B k     6C l     6D m     6E n    6F o
     70 p     71 q     72 r     73 s     74 t     75 u     76 v    77 w
     78 x     79 y     7A z     7B {     7C |     7D }     7E ~    7F DEL

Somehow the characters in the encoded POST data include non-printable characters, eg %0D, %0A.... So shall I do decoding to find out the name value pairs. The answser is no. What I did was to simply prepend a "$" (dollar) sign in front of the value so that I can dynamically get Tcl to do the substitution.

sid=&security=false&ldapurl=pab&filter=cmd%3DPAB_CMD_ADD_ENTRY%7Cnew%3Dgivenname%3A$_FIRSTNAME_%0D%0A
sn%3A$_LASTNAME_%0D%0Acn%3A$_DISPLAY_%0D%0Aobjectclass%3Apabperson%0D%0Amemberofpab%3AAddressBookc6e5131%0D%0A
mail%3A$_EMAIL_%0D%0Atelephonenumber%3A$_OFFICE_%0D%0Ahomephone%3A$_HOME_%0D%0Amobile%3A$_MOBILE_%0D%0A
pager%3A$_PAGER_%0D%0Afacsimiletelephonenumber%3A$_FAX_%0D%0A%0D%0A%7C

All I have to is to export the Excel data into CSV (I choose tab separated to avoid comma character in the data) and get the program to loop through every line. For each line, all I have to do is to extract the email address, name, ... and set the corresponding variables. The correct POST data will immediately fall into places. BTW, the login process requires to capture the Cookie, which is the session key for the entire process.

Below shows the script and the screen dumps (add contact, livehttpheaders)

#! /usr/bin/tclsh


if { $argc != 1 } {
 puts stderr "Usage: $argv0 <tab-separated-csv>"
 exit 1
}
set tabcsv [lindex $argv 0]



proc getCookieHeader { s } {
 set cookie {}
 foreach {n v} [set ${s}(meta)] {
  if { [string equal -nocase {Set-Cookie} $n] } {
   lappend cookie $v
  }
 }
 set set_cookie {}
 foreach c $cookie {
  append set_cookie "[lindex [split $c {;}] 0]; "
 }
 return [string range $set_cookie 0 end-2]
}




set u username; set p password




package require http

set url            "http://some.mail.server"
set url_login      "$url/login.msc"
set url_addcontact "$url/ldap.msc"
set url_logout     "$url/cmd.msc?sid=&mbox=&cmd=logout&security=false&lang=en"



# login
set s [http::geturl $url_login -query [http::formatQuery user $u password $p]]
set header [getCookieHeader $s]
http::cleanup $s


set fp [open $tabcsv r]
# skip  first line
gets $fp line
while { [gets $fp line] >= 0 } {
 set linelist [split $line "\t"]
 set company [lindex $linelist 0]
 set custname [lindex $linelist 1]
 set custemail [lindex $linelist 2]

 if { [string length $custemail]==0 || [llength $linelist]<=3 } { 
  continue
 }
 puts $custemail

 set _FIRSTNAME_ ""
 set _LASTNAME_  "$custname"
 set _DISPLAY_ "$company - $custname"
 set _EMAIL_ "$custemail"
 set _OFFICE_ ""
 set _HOME_ ""
 set _MOBILE_ ""
 set _PAGER_ ""
 set _FAX_ ""

 # add contact
 set contact_query "sid=&security=false&ldapurl=pab&filter=cmd%3DPAB_CMD_ADD_ENTRY%7Cnew%3Dgivenname%3A$_FIRSTNAME_%0D%0Asn%3A$_LASTNAME_%0D%0Acn%3A$_DISPLAY_%0D%0Aobjectclass%3Apabperson%0D%0Amemberofpab%3AAddressBookc6e5131%0D%0Amail%3A$_EMAIL_%0D%0Atelephonenumber%3A$_OFFICE_%0D%0Ahomephone%3A$_HOME_%0D%0Amobile%3A$_MOBILE_%0D%0Apager%3A$_PAGER_%0D%0Afacsimiletelephonenumber%3A$_FAX_%0D%0A%0D%0A%7C"
 set s [http::geturl $url_addcontact -headers [list Cookie $header] -query $contact_query]

 http::cleanup $s
}
close $fp




# logout
set s [http::geturl $url_logout -headers [list Cookie $header]]
http::cleanup $s

Labels: Tcl, Web Scraping

Firewall Navigation

A colleague of mine is not having enough sleep lately because he has to understand a 12,000+ lines of firewall rules. The customer wants him to find out the relationship amount the 700+ named hosts, 500+ object-groups, 10+ access-groups and 10+ interfaces. He has to literally use Ctrl-F to find the object and copy-and paste into MS Excel. He spent at least a whole week in this exercise and not able finish half of the rules.

I happened to bump into him the other day in the data centre and thought I may be able to help him. At first, I thought I could model this in a direct graph and visualise it using Graphviz. It turned out to be quite awkward and not easy to model. I also explored in converting the rules to XML, but I need a good XML nagivator that I can traverse which I couldn't find one. After some thought, I realised I may be able to use Freemind, a free mind mapping software to visualise the data. This looked very promising but it may take a while for me to implement something useful for my colleague. He needs a tool now.

This implementation that I am going to show you is pretty easy. First, I need to plant my anchors (<a name=>) for host name, object-group, access-group, interface and line number. Second, I need to find out all the references that refer to the above anchors (that's about 1500+) and make them hyperlink. A CGI shell script is used to 'grep' the pattern when the user click on any of the anchors in the left menu, the script also dynamically highlights the search word in red. All the hyperlinks are clickable so that it can jump to the reference anchor in the original firewall configuration file for details, especially if the link is a object-group. Line numbers are also dynamically hyperlinked.

The whole process of converting the plain text firewall configuration to dynamical web front end can be achieved by shell scripts and Tcl. Tcl "string map" is very power in converting references to html hyperlinks. My original implementation using the 'dump' way took hours to do. With "string map", it took 19 seconds! Such a performance gain can only be achieved by using the right tool for the right job. BTW, lots of thinking involved before the actual implementation.

I blurred the screen dump to hide the actual firewall details. Just to cover my ass. FYI, I will still explore the freemind way when I have more time.

Labels: shell script, Tcl

Chi Hung Chan

Friday, October 26, 2007

OpenSolaris and Tcl

Wednesday, October 17, 2007

Table^10

Monday, October 15, 2007

Freemind for visualisation

Removing Files

find find

Saturday, October 06, 2007

Add Contacts in Webmail

Firewall Navigation

About Me

Search My Blog

Other Blogs

Previous Posts

Archives