Thursday, May 17, 2007

RSS Feed from Government Sites ? One only

I was asked to write a feed aggregration for a project, so I tried to aggregrate the government related sites (150+) as an examples. Guess what, all of those sites that are available do not have RSS or Atom feed provided in their HTML header, execpt the official government site.

Tcl program was written using packages such as http and tDOM to do this exploratory work. BTW, ActiveTcl from ActiveState has a lot of these extensions pre-compiled. I extracted all the govt related sites from their "A-Z Government List" and looped through all these sites to see whether they provide feed in their header. A snippet of the Tcl using XPath syntax to locate the feed link is:

set result [$root selectNode {//link[@type="application/rss+xml" or @type="application/atom+xml"]}]
if { [llength $result] > 0 } {
 puts "Yes - $url"
} else {
 puts "No  - $url"
}

So the big question is: Why there isn't any feed ? May be it is embedded in the html body instead of head, or there is intellectual property in the feed that is too valuable to expose to the rest of the world.

Anyway, I will let you to figure that out.

Labels: , , ,