xmllint - Answer to an XML Question
After some thoughts, I think I should validate my anwser. I downloaded a pretty sizeable XML file from Mondial project for my test. I deliberately removed one of the closing tags to make it not well-formed. Both tdom and Firefox are not able to identify the exact location of the missing closing tag. It is only xmllint is able to pinpoint the location
$ diff mondial.xml mondial-malformed.xml 16819d16818 < </country> $ firefox mondial-malformed.xml Firefox XML Parsing Error: mismatched tag. Expected: </country>. Location: file:///home/chihung/Projects/xmllint/mondial-malformed.xml Line Number 39564, Column 3:</mondial> --^ $ tclsh % package require tdom 0.8.3 % set doc [dom parse [tDOM::xmlReadFile mondial-malformed.xml]] error "mismatched tag" at line 39564 character 2 "ude> </desert> </m <--Error-- ondial> " $ xmllint --shell mondial-malformed.xml mondial-malformed.xml:39564: parser error : Opening and ending tag mismatch: country line 16795 and mondial </mondial> ^ mondial-malformed.xml:39565: parser error : Premature end of data in tag mondial line 3 ^
OK, xmllint is sure the winner in this exercise. Below shows xmllint in action:
$ xmllint --shell mondial.xml / > help base display XML base of the node setbase URI change the XML base of the node bye leave shell cat [node] display node or current node cd [path] change directory to path or to root dir [path] dumps informations about the node (namespace, attributes, content) du [path] show the structure of the subtree under path or the current node exit leave shell help display this help free display memory usage load [name] load a new document with name ls [path] list contents of path or the current directory set xml_fragment replace the current node content with the fragment parsed in context xpath expr evaluate the XPath expression in that context and print the result setns nsreg register a namespace to a prefix in the XPath evaluation context format for nsreg is: prefix=[nsuri] (i.e. prefix= unsets a prefix) setrootns register all namespace found on the root element the default namespace if any uses 'defaultns' prefix pwd display current working directory quit leave shell save [name] save this document to name or the original name write [name] write the current node to the filename validate check the document for errors relaxng rng validate the document agaisnt the Relax-NG schemas grep string search for a string in the subtree / > validate mondial.xml:35144: element island: validity error : Syntax of value for attribute sea of island is not valid validity error : attribute sea line 35144 references an unknown ID "" / > base mondial.xml / > dir DOCUMENT version=1.0 encoding=UTF-8 URL=mondial.xml standalone=true / > grep Singapore /mondial/country[105]/name : t-- 9 Singapore /mondial/country[105]/city/name : t-- 9 Singapore /mondial/island[163]/name : t-- 9 Singapore / > cd /mondial/country[105] country > cat <country car_code="SGP" area="632.6" capital="cty-Singapore-Singapore" memberships="org-AsDB org-ASEAN org-Mekong-Group org-CP org-C org-CCC org-ESCAP org-G-77 org-IAEA org-IBRD org-ICC org-ICAO org-ICFTU org-Interpol org-IFRCS org-IFC org-ILO org-IMO org-Inmarsat org-IMF org-IOC org-ISO org-ICRM org-ITU org-Intelsat org-NAM org-PCA org-UN org-UNIKOM org-UPU org-WHO org-WIPO org-WMO org-WTrO"> <name>Singapore</name> <population>3396924</population> <population_growth>1.9</population_growth> <infant_mortality>4.7</infant_mortality> <gdp_total>66100</gdp_total> <gdp_ind>28</gdp_ind> <gdp_serv>72</gdp_serv> <inflation>1.7</inflation> <indep_date>1965-08-09</indep_date> <government>republic within Commonwealth</government> <encompassed continent="asia" percentage="100"/> <ethnicgroups percentage="6.4">Indian</ethnicgroups> <ethnicgroups percentage="76.4">Chinese</ethnicgroups> <ethnicgroups percentage="14.9">Malay</ethnicgroups> <city id="cty-Singapore-Singapore" is_country_cap="yes" country="SGP"> <name>Singapore</name> <longitude>103.833</longitude> <latitude>1.3</latitude> <population year="87">2558000</population> <located_at watertype="sea" sea="sea-SouthChinaSea"/> <located_on island="island-Singapore"/> </city> </country>
Finding countries with infant_mortality less than Singapore.
country > xpath //country[infant_mortality<4.7]/name/text() Object is a Node Set : Set contains 9 nodes: 1 TEXT content=Andorra 2 TEXT content=Sweden 3 TEXT content=Iceland 4 TEXT content=Jersey 5 TEXT content=Man 6 TEXT content=Hong Kong 7 TEXT content=Japan 8 TEXT content=Anguilla 9 TEXT content=Bermuda country > quit
This can be turned into command line too.
$ xmllint --xpath '//country[infant_mortality<4.7]/name' --format mondial.xml <name>Andorra</name><name>Sweden</name><name>Iceland</name><name>Jersey</name><name>Man</name><name>Hong Kong</name><name>Japan</name><name>Anguilla</name><name>Bermuda</name> real 0m0.219s user 0m0.192s sys 0m0.020s
Alternatively, you can do the above dynamically:
$ xmllint --shell mondial.xml / > xpath //country[infant_mortality<//country[name="Singapore"]/infant_mortality]/name/text() Object is a Node Set : Set contains 9 nodes: 1 TEXT content=Andorra 2 TEXT conte;nt=Sweden 3 TEXT content=Iceland 4 TEXT content=Jersey 5 TEXT content=Man 6 TEXT content=Hong Kong 7 TEXT content=Japan 8 TEXT content=Anguilla 9 TEXT content=Bermuda $ time xmllint --xpath '//country[infant_mortality<//country[name="Singapore"]/infant_mortality]/name' --format mondial.xml <name>Andorra</name><name>Sweden</name><name>Iceland</name><name>Jersey</name><name>Man</name><name>Hong Kong</name><name>Japan</name><name>Anguilla</name><name>Bermuda</name> real 0m2.074s user 0m2.052s sys 0m0.016s
xmllint is definitely the preferred XML companion. It is extremely fast and efficient comparing with Firefox and tdom.
2 Comments:
Hi
I got an error: xpath command not found when Im out of the shell. Please help me.
My xpath is a command within the xmllint shell.
If you install libxml-xpath-perl, you will have a /usr/bin/xpath Perl script
Post a Comment
<< Home