xmllint - Answer to an XML Question
After some thoughts, I think I should validate my anwser. I downloaded a pretty sizeable XML file from Mondial project for my test. I deliberately removed one of the closing tags to make it not well-formed. Both tdom and Firefox are not able to identify the exact location of the missing closing tag. It is only xmllint is able to pinpoint the location
$ diff mondial.xml mondial-malformed.xml
16819d16818
< </country>
$ firefox mondial-malformed.xml
Firefox
XML Parsing Error: mismatched tag. Expected: </country>.
Location: file:///home/chihung/Projects/xmllint/mondial-malformed.xml
Line Number 39564, Column 3:</mondial>
--^
$ tclsh
% package require tdom
0.8.3
% set doc [dom parse [tDOM::xmlReadFile mondial-malformed.xml]]
error "mismatched tag" at line 39564 character 2
"ude>
</desert>
</m <--Error-- ondial>
"
$ xmllint --shell mondial-malformed.xml
mondial-malformed.xml:39564: parser error : Opening and ending tag mismatch: country line 16795 and mondial
</mondial>
^
mondial-malformed.xml:39565: parser error : Premature end of data in tag mondial line 3
^
OK, xmllint is sure the winner in this exercise. Below shows xmllint in action:
$ xmllint --shell mondial.xml
/ > help
base display XML base of the node
setbase URI change the XML base of the node
bye leave shell
cat [node] display node or current node
cd [path] change directory to path or to root
dir [path] dumps informations about the node (namespace, attributes, content)
du [path] show the structure of the subtree under path or the current node
exit leave shell
help display this help
free display memory usage
load [name] load a new document with name
ls [path] list contents of path or the current directory
set xml_fragment replace the current node content with the fragment parsed in context
xpath expr evaluate the XPath expression in that context and print the result
setns nsreg register a namespace to a prefix in the XPath evaluation context
format for nsreg is: prefix=[nsuri] (i.e. prefix= unsets a prefix)
setrootns register all namespace found on the root element
the default namespace if any uses 'defaultns' prefix
pwd display current working directory
quit leave shell
save [name] save this document to name or the original name
write [name] write the current node to the filename
validate check the document for errors
relaxng rng validate the document agaisnt the Relax-NG schemas
grep string search for a string in the subtree
/ > validate
mondial.xml:35144: element island: validity error : Syntax of value for attribute sea of island is not valid
validity error : attribute sea line 35144 references an unknown ID ""
/ > base
mondial.xml
/ > dir
DOCUMENT
version=1.0
encoding=UTF-8
URL=mondial.xml
standalone=true
/ > grep Singapore
/mondial/country[105]/name : t-- 9 Singapore
/mondial/country[105]/city/name : t-- 9 Singapore
/mondial/island[163]/name : t-- 9 Singapore
/ > cd /mondial/country[105]
country > cat
<country car_code="SGP" area="632.6" capital="cty-Singapore-Singapore" memberships="org-AsDB org-ASEAN org-Mekong-Group org-CP org-C org-CCC org-ESCAP org-G-77 org-IAEA org-IBRD org-ICC org-ICAO org-ICFTU org-Interpol org-IFRCS org-IFC org-ILO org-IMO org-Inmarsat org-IMF org-IOC org-ISO org-ICRM org-ITU org-Intelsat org-NAM org-PCA org-UN org-UNIKOM org-UPU org-WHO org-WIPO org-WMO org-WTrO">
<name>Singapore</name>
<population>3396924</population>
<population_growth>1.9</population_growth>
<infant_mortality>4.7</infant_mortality>
<gdp_total>66100</gdp_total>
<gdp_ind>28</gdp_ind>
<gdp_serv>72</gdp_serv>
<inflation>1.7</inflation>
<indep_date>1965-08-09</indep_date>
<government>republic within Commonwealth</government>
<encompassed continent="asia" percentage="100"/>
<ethnicgroups percentage="6.4">Indian</ethnicgroups>
<ethnicgroups percentage="76.4">Chinese</ethnicgroups>
<ethnicgroups percentage="14.9">Malay</ethnicgroups>
<city id="cty-Singapore-Singapore" is_country_cap="yes" country="SGP">
<name>Singapore</name>
<longitude>103.833</longitude>
<latitude>1.3</latitude>
<population year="87">2558000</population>
<located_at watertype="sea" sea="sea-SouthChinaSea"/>
<located_on island="island-Singapore"/>
</city>
</country>
Finding countries with infant_mortality less than Singapore.
country > xpath //country[infant_mortality<4.7]/name/text()
Object is a Node Set :
Set contains 9 nodes:
1 TEXT
content=Andorra
2 TEXT
content=Sweden
3 TEXT
content=Iceland
4 TEXT
content=Jersey
5 TEXT
content=Man
6 TEXT
content=Hong Kong
7 TEXT
content=Japan
8 TEXT
content=Anguilla
9 TEXT
content=Bermuda
country > quit
This can be turned into command line too.
$ xmllint --xpath '//country[infant_mortality<4.7]/name' --format mondial.xml <name>Andorra</name><name>Sweden</name><name>Iceland</name><name>Jersey</name><name>Man</name><name>Hong Kong</name><name>Japan</name><name>Anguilla</name><name>Bermuda</name> real 0m0.219s user 0m0.192s sys 0m0.020s
Alternatively, you can do the above dynamically:
$ xmllint --shell mondial.xml
/ > xpath //country[infant_mortality<//country[name="Singapore"]/infant_mortality]/name/text()
Object is a Node Set :
Set contains 9 nodes:
1 TEXT
content=Andorra
2 TEXT
conte;nt=Sweden
3 TEXT
content=Iceland
4 TEXT
content=Jersey
5 TEXT
content=Man
6 TEXT
content=Hong Kong
7 TEXT
content=Japan
8 TEXT
content=Anguilla
9 TEXT
content=Bermuda
$ time xmllint --xpath '//country[infant_mortality<//country[name="Singapore"]/infant_mortality]/name' --format mondial.xml
<name>Andorra</name><name>Sweden</name><name>Iceland</name><name>Jersey</name><name>Man</name><name>Hong Kong</name><name>Japan</name><name>Anguilla</name><name>Bermuda</name>
real 0m2.074s
user 0m2.052s
sys 0m0.016s
xmllint is definitely the preferred XML companion. It is extremely fast and efficient comparing with Firefox and tdom.

2 Comments:
Hi
I got an error: xpath command not found when Im out of the shell. Please help me.
My xpath is a command within the xmllint shell.
If you install libxml-xpath-perl, you will have a /usr/bin/xpath Perl script
Post a Comment
<< Home