Wednesday, October 17, 2007

Table^10

If you watch TV often enough, you will probably come across this ad about the new web site, Mocca.com (MediaCorp Online Communities and Classified Advertising).

The above site looks pretty good on my Firefox with a response time of about 3 seconds for 121 requests (413KB in total). The result is based on the Firefox Add-on, Firebug. 100+ requests is considered quite a lot.

What I normally do when I find the site "interesting" is to look at the HTML code. Guess what, I realised that there is a lot issues with the HTML source.

  • The redirection is not done properly. Normally it will be redirected to the location via the Location or Content-Location header to avoid additional overhead from browser to render the HTML code. Also, the HTML is definitely ill-formed and not complete.
    $ curl --dump-header /dev/tty http://mocca.com/
    HTTP/1.1 200 OK
    Content-Length: 61
    Content-Type: text/html
    Content-Location: http://mocca.com/index.htm
    Last-Modified: Wed, 26 Sep 2007 09:30:19 GMT
    Accept-Ranges: bytes
    ETag: "b8a39eda1f0c81:17de"
    X-Powered-By: ASP.NET
    Date: Tue, 16 Oct 2007 08:46:21 GMT
    Server: Concealed by Juniper Networks DX
    Via: 1.1 MC-LB1 (Juniper Networks Application Acceleration Platform - DX 5.2.5 0)
    Set-Cookie: rl-sticky-key=caacb80750; path=/; expires=Tue, 16 Oct 2007 09:31:03GMT
    
    <META HTTP-EQUIV="Refresh" CONTENT="1; URL=/portal/site/cas">
    
  • When I fetched the home page http://mocca.com/portal/site/cas, I realised that the HTML code has the <title> node before the <html> node. If you were to convert the HTML to a DOM tree, you may likely get 1 node in the tree. Also, I submitted the URL to W3C HTML Validation Service, and it reported 218 Errors.
    $ curl --silent http://mocca.com/portal/site/cas 2>&1 | awk '$0!~/^$/{print}' | head -10
    
    <title>MediaCorp Mocca </title>
    <html xmlns="http://www.w3.org/1999/xhtml">
            <head>
    
    <link href="/vgn-ext-templating/common/styles/vgn-ext-templating.css" rel="stylesheet" type="text/css"></link>
              <script language="JavaScript" src="/portal/jslib/form_state_manager.js"></script>
              <noscript>In order to bring you the best possible user experience, this site uses Javascript. If you are seeing this message, it is likely that the Javascript option in your browser is disabled. For optimal viewing of this site, p
    lease ensure that Javascript is enabled for your browser.
              </noscript>
                    <base target="_top">
    
  • I realised there is a lot of table within table within table. Hey, that's interesting. At the back of my mind I was wondering how deep is this nesting of tables going to be. A Tcl program with tDOM extension should do the job. Of course I have to remove the first <title> before the parsing, otherwise I will end up with 1 node in the tree.
    package require tdom
    
    proc howdeep { node } {
     set l [split [$node toXPath] /]
     set n [llength $l]
     set count 0
     for { set i 0 } { $i < $n } { incr i } {
      if { [string match -nocase "table*" [lindex $l $i]] } {
       incr count
      }
     }
     return $count
    }
    
    set html index.html-modified
    set doc [dom parse -html [tDOM::xmlReadFile $html]]
    set root [$doc documentElement]
    
    set max 0
    set maxnode {}
    foreach table [$root selectNodes {//table}] {
     set level [howdeep $table]
     if { $level > $max } {
      set max $level
      set maxnode $table
     }
    }
    puts $max
    puts [llength [$root selectNodes {//table}]]
    puts [$maxnode toXPath]
    
    Output of this program:
    10
    144
    /html/body/table/tr[2]/td[2]/table/tr[2]/td/table/tr/td[3]/table/tr/td/table/tr/td/table[1]/tr/td[1]/table/tr[2]/td[2]/table/tr/td[1]/table/tr[1]/td/table
    

Wow, we are talking about a nesting of 10 levels of <table> and a total of 144 tables, that's a lot! So, what's the conclusion.

  • Modern browsers are so forgiving and they normally do a very good job to render complex and even ill-formed HTML code
  • With such a deep nesting of tables and 100+ requests, the browser is able to render the content in seconds. That is amusing.

Labels: , , ,

0 Comments:

Post a Comment

<< Home