Building a High Performance Cluster with Amazon Web Services
Labels: HPC
The Scripting Guy in the Lion City with a performance sense.
Labels: HPC
$ cat ns.xml <?xml version="1.0"?> <Tests xmlns="http://www.adatum.com"> <Test TestId="0001" TestType="CMD"> <Name>Convert number to string</Name> <CommandLine>Examp1.EXE</CommandLine> <Input>1</Input> <Output>One</Output> </Test> <Test TestId="0002" TestType="CMD"> <Name>Find succeeding characters</Name> <CommandLine>Examp2.EXE</CommandLine> <Input>abc</Input> <Output>def</Output> </Test> <Test TestId="0003" TestType="GUI"> <Name>Convert multiple numbers to strings</Name> <CommandLine>Examp2.EXE /Verbose</CommandLine> <Input>123</Input> <Output>One Two Three</Output> </Test> <Test TestId="0004" TestType="GUI"> <Name>Find correlated key</Name> <CommandLine>Examp3.EXE</CommandLine> <Input>a1</Input> <Output>b1</Output> </Test> <Test TestId="0005" TestType="GUI"> <Name>Count characters</Name> <CommandLine>FinalExamp.EXE</CommandLine> <Input>This is a test</Input> <Output>14</Output> </Test> <Test TestId="0006" TestType="GUI"> <Name>Another Test</Name> <CommandLine>Examp2.EXE</CommandLine> <Input>Test Input</Input> <Output>10</Output> </Test> </Tests> $ xmllint --shell ns.xml / > cd Tests Tests is a 0 Node Set / >
In order to traverse XML file with namespace defined, you need to set it with a prefix.
$ head -2 ns.xml <?xml version="1.0"?> <Tests xmlns="http://www.adatum.com"> $ xmllint --shell ns.xml / > setns a=http://www.adatum.com / > cd a:Tests Tests > cd a:Test a:Test is a 6 Node Set Tests > cd a:Test[3] Test > dir ELEMENT Test ATTRIBUTE TestId TEXT content=0003 ATTRIBUTE TestType TEXT content=GUI Test > cat <Test TestId="0003" TestType="GUI"> <Name>Convert multiple numbers to strings</Name> <CommandLine>Examp2.EXE /Verbose</CommandLine> <Input>123</Input> <Output>One Two Three</Output> </Test> Test >
If you have more than 1 namespace to work with, just set it with a different prefix name. You do not have to use the same namespace declaration mapping.
$ cat ns2.xml <h:html xmlns:xdc="http://www.xml.com/books" xmlns:h="http://www.w3.org/HTML/1998/html4"> <h:head><h:title>Book Review</h:title></h:head> <h:body> <xdc:bookreview> <xdc:title>XML: A Primer</xdc:title> <h:table> <h:tr align="center"> <h:td>Author</h:td><h:td>Price</h:td> <h:td>Pages</h:td><h:td>Date</h:td></h:tr> <h:tr align="left"> <h:td><xdc:author>Simon St. Laurent</xdc:author></h:td> <h:td><xdc:price>31.98</xdc:price></h:td> <h:td><xdc:pages>352</xdc:pages></h:td> <h:td><xdc:date>1998/01</xdc:date></h:td> </h:tr> </h:table> </xdc:bookreview> </h:body> </h:html> $ xmllint --shell ns2.xml / > cd h:html h:html is a 0 Node Set / > setns h=http://www.w3.org/HTML/1998/html4 / > setns xdc=http://www.xml.com/books / > cd h:html/h:body/xdc:bookreview/xdc:title title > cat <xdc:title>XML: A Primer</xdc:title> title >
Labels: XML
After some thoughts, I think I should validate my anwser. I downloaded a pretty sizeable XML file from Mondial project for my test. I deliberately removed one of the closing tags to make it not well-formed. Both tdom and Firefox are not able to identify the exact location of the missing closing tag. It is only xmllint is able to pinpoint the location
$ diff mondial.xml mondial-malformed.xml 16819d16818 < </country> $ firefox mondial-malformed.xml Firefox XML Parsing Error: mismatched tag. Expected: </country>. Location: file:///home/chihung/Projects/xmllint/mondial-malformed.xml Line Number 39564, Column 3:</mondial> --^ $ tclsh % package require tdom 0.8.3 % set doc [dom parse [tDOM::xmlReadFile mondial-malformed.xml]] error "mismatched tag" at line 39564 character 2 "ude> </desert> </m <--Error-- ondial> " $ xmllint --shell mondial-malformed.xml mondial-malformed.xml:39564: parser error : Opening and ending tag mismatch: country line 16795 and mondial </mondial> ^ mondial-malformed.xml:39565: parser error : Premature end of data in tag mondial line 3 ^
OK, xmllint is sure the winner in this exercise. Below shows xmllint in action:
$ xmllint --shell mondial.xml / > help base display XML base of the node setbase URI change the XML base of the node bye leave shell cat [node] display node or current node cd [path] change directory to path or to root dir [path] dumps informations about the node (namespace, attributes, content) du [path] show the structure of the subtree under path or the current node exit leave shell help display this help free display memory usage load [name] load a new document with name ls [path] list contents of path or the current directory set xml_fragment replace the current node content with the fragment parsed in context xpath expr evaluate the XPath expression in that context and print the result setns nsreg register a namespace to a prefix in the XPath evaluation context format for nsreg is: prefix=[nsuri] (i.e. prefix= unsets a prefix) setrootns register all namespace found on the root element the default namespace if any uses 'defaultns' prefix pwd display current working directory quit leave shell save [name] save this document to name or the original name write [name] write the current node to the filename validate check the document for errors relaxng rng validate the document agaisnt the Relax-NG schemas grep string search for a string in the subtree / > validate mondial.xml:35144: element island: validity error : Syntax of value for attribute sea of island is not valid validity error : attribute sea line 35144 references an unknown ID "" / > base mondial.xml / > dir DOCUMENT version=1.0 encoding=UTF-8 URL=mondial.xml standalone=true / > grep Singapore /mondial/country[105]/name : t-- 9 Singapore /mondial/country[105]/city/name : t-- 9 Singapore /mondial/island[163]/name : t-- 9 Singapore / > cd /mondial/country[105] country > cat <country car_code="SGP" area="632.6" capital="cty-Singapore-Singapore" memberships="org-AsDB org-ASEAN org-Mekong-Group org-CP org-C org-CCC org-ESCAP org-G-77 org-IAEA org-IBRD org-ICC org-ICAO org-ICFTU org-Interpol org-IFRCS org-IFC org-ILO org-IMO org-Inmarsat org-IMF org-IOC org-ISO org-ICRM org-ITU org-Intelsat org-NAM org-PCA org-UN org-UNIKOM org-UPU org-WHO org-WIPO org-WMO org-WTrO"> <name>Singapore</name> <population>3396924</population> <population_growth>1.9</population_growth> <infant_mortality>4.7</infant_mortality> <gdp_total>66100</gdp_total> <gdp_ind>28</gdp_ind> <gdp_serv>72</gdp_serv> <inflation>1.7</inflation> <indep_date>1965-08-09</indep_date> <government>republic within Commonwealth</government> <encompassed continent="asia" percentage="100"/> <ethnicgroups percentage="6.4">Indian</ethnicgroups> <ethnicgroups percentage="76.4">Chinese</ethnicgroups> <ethnicgroups percentage="14.9">Malay</ethnicgroups> <city id="cty-Singapore-Singapore" is_country_cap="yes" country="SGP"> <name>Singapore</name> <longitude>103.833</longitude> <latitude>1.3</latitude> <population year="87">2558000</population> <located_at watertype="sea" sea="sea-SouthChinaSea"/> <located_on island="island-Singapore"/> </city> </country>
Finding countries with infant_mortality less than Singapore.
country > xpath //country[infant_mortality<4.7]/name/text() Object is a Node Set : Set contains 9 nodes: 1 TEXT content=Andorra 2 TEXT content=Sweden 3 TEXT content=Iceland 4 TEXT content=Jersey 5 TEXT content=Man 6 TEXT content=Hong Kong 7 TEXT content=Japan 8 TEXT content=Anguilla 9 TEXT content=Bermuda country > quit
This can be turned into command line too.
$ xmllint --xpath '//country[infant_mortality<4.7]/name' --format mondial.xml <name>Andorra</name><name>Sweden</name><name>Iceland</name><name>Jersey</name><name>Man</name><name>Hong Kong</name><name>Japan</name><name>Anguilla</name><name>Bermuda</name> real 0m0.219s user 0m0.192s sys 0m0.020s
Alternatively, you can do the above dynamically:
$ xmllint --shell mondial.xml / > xpath //country[infant_mortality<//country[name="Singapore"]/infant_mortality]/name/text() Object is a Node Set : Set contains 9 nodes: 1 TEXT content=Andorra 2 TEXT conte;nt=Sweden 3 TEXT content=Iceland 4 TEXT content=Jersey 5 TEXT content=Man 6 TEXT content=Hong Kong 7 TEXT content=Japan 8 TEXT content=Anguilla 9 TEXT content=Bermuda $ time xmllint --xpath '//country[infant_mortality<//country[name="Singapore"]/infant_mortality]/name' --format mondial.xml <name>Andorra</name><name>Sweden</name><name>Iceland</name><name>Jersey</name><name>Man</name><name>Hong Kong</name><name>Japan</name><name>Anguilla</name><name>Bermuda</name> real 0m2.074s user 0m2.052s sys 0m0.016s
xmllint is definitely the preferred XML companion. It is extremely fast and efficient comparing with Firefox and tdom.