Most modern applications use JSON and YAML for communication and configuration. However, there are many applications still using XML. If for some reason, you have to work with an XML file over bash, this post will cover many examples to extract the values from an XML file.
Note that there are other alternatives to xmllint
for XML extraction like xmlstarlet
, xidel
, etc. However, this post will be limited to using xmllint
. Also, note that we are often tempted to use tools like awk, sed, or grep to extract the XML value using REGEX. However, it’s a dangerous bet, and any schema change may lead to disaster. Consider using an XML-aware tool like xmllint.
- Example Input XML file:
- An equivalent JSON for the example JSON(only for visualization)
- NOTE: If the XPath flag is not supported by your xmllint
- Examples:
- Example-1: Formatting an XML
- Example-2: Pretty printing the XML
- Example-3: Remove all the new lines and spaces
- Example-4: Validating the XML
- Example-5: Printing the structure of the XML
- Example-6: Extract the entire XML
- Example-7: Extract the first element
- Example-8: Extract a value from all the nodes
- Example-9: concatenate the fields in XML
- Example-10: Looping over elements of XML and concatenating the fields for each node
- Example-11: selecting an element by its name in XML using contains() function
- Example-12: Excluding an element by using not contains()
- Example-13: Extracting the tag value
- Example-14: Finding a tag or string in a big XML file
Example Input XML file:
<?xml version="1.0"?>
<results>
<students>
<student>
<name>Roy</name>
<maths>89</maths>
<physics>91</physics>
<chemistry>73</chemistry>
</student>
<student>
<name>Bob</name>
<maths>81</maths>
<physics>67</physics>
<chemistry>70</chemistry>
</student>
<student>
<name>Jenny</name>
<maths>99</maths>
<physics>99</physics>
<chemistry>99</chemistry>
</student>
</students>
</results>
An equivalent JSON for the example JSON(only for visualization)
{
"results": {
"students": {
"student": [
{
"name": "Roy",
"maths": "89",
"physics": "91",
"chemistry": "73"
},
{
"name": "Bob",
"maths": "81",
"physics": "67",
"chemistry": "70"
},
{
"name": "Jenny",
"maths": "99",
"physics": "99",
"chemistry": "99"
}
]
}
}
}
NOTE: If the XPath flag is not supported by your xmllint
A few distros or some older distributions do not support all the features when supplied via the flags to the xmllint
Command line. For example, XPath querying is one of the most sought-after features of xmllint. The following command may not work on all the distros if the --xpath
flag is not supported in your distro. However, you do not have to dishearten; there is a secondary way to use the XPath
query. If you are new to xmllint, then start assuming you have all the capabilities; I have provided a secondary workaround for a small set of people that do not have xmllint with all the capabilities enabled on the command line.
The first method uses command line args:(available in all modern distros)
xmllint --xpath '//results/*' sample.xml
The second method uses xmllint shell and stdin(the workaround for old servers)
echo "cat //results/*" |xmllint --shell sample.xml |grep -v '^/ >'
Examples:
Example-1: Formatting an XML
xmllint --format sample.xml
<?xml version="1.0"?>
<results>
<students>
<student>
<name>Roy</name>
<maths>89</maths>
<physics>91</physics>
<chemistry>73</chemistry>
</student>
<student>
<name>Bob</name>
<maths>81</maths>
<physics>67</physics>
<chemistry>70</chemistry>
</student>
<student>
<name>Jenny</name>
<maths>99</maths>
<physics>99</physics>
<chemistry>99</chemistry>
</student>
</students>
</results>
Example-2: Pretty printing the XML
--pretty STYLE : pretty-print in a particular style
0 Do not pretty print
1 Format the XML content, as --format
2 Add whitespace inside tags, preserving content
xmllint --pretty 0 sample2.xml
<?xml version="1.0"?>
<tools>
<tool type="editor">
<name>vim</name>
<name>nano</name>
</tool>
<tool type="browser">
<name>chrome</name>
<name>firefox</name>
<name>brave</name>
</tool>
<tool type="photoeditor">
<name>photoshop</name>
<name>gimp</name>
</tool>
</tools>
Example-3: Remove all the new lines and spaces
xmllint --noblanks sample.xml
<?xml version="1.0"?>
<results><students><student><name>Roy</name><maths>89</maths><physics>91</physics><chemistry>73</chemistry></student><student><name>Bob</name><maths>81</maths><physics>67</physics><chemistry>70</chemistry></student><student><name>Jenny</name><maths>99</maths><physics>99</physics><chemistry>99</chemistry></student></students></results>
Example-4: Validating the XML
If the input file has valid XML syntax, then the xmllint command will print the file without any error; however, if there is any error, it will highlight the error. For example, in the below example, I have changed a closing tag from “name” to “name1” causing a missing closing tag.
xmllint sample.xml
sample.xml:4: parser error : Opening and ending tag mismatch: name line 4 and name1
<name>Roy</name1>
^
Example-5: Printing the structure of the XML
echo 'du /' |xmllint --shell sample.xml |grep -v '^/ >'
results
students
student
name
maths
physics
chemistry
student
name
maths
physics
chemistry
student
name
maths
physics
chemistry
Example-6: Extract the entire XML
xmllint --xpath '/*' sample.xml
#or
echo "cat /*" |xmllint --shell sample.xml |grep -v '^/ >'
<results>
<students>
<student>
<name>Roy</name>
<maths>89</maths>
<physics>91</physics>
<chemistry>73</chemistry>
</student>
<student>
<name>Bob</name>
<maths>81</maths>
<physics>67</physics>
<chemistry>70</chemistry>
</student>
<student>
<name>Jenny</name>
<maths>99</maths>
<physics>99</physics>
<chemistry>99</chemistry>
</student>
</students>
</results>
Example-7: Extract the first element
xmllint --xpath '//results/students/student[1]' sample.xml
#or
echo 'cat //results/students/student[1]' | xmllint --shell sample.xml |grep -Ev '^/ >'
<student>
<name>Roy</name>
<maths>89</maths>
<physics>91</physics>
<chemistry>73</chemistry>
</student>
Example-8: Extract a value from all the nodes
xmllint --xpath '//results/students/student[*]/name/text()' sample.xml
#or
echo 'cat //results/students/student[*]/name/text()' | xmllint --shell sample.xml |grep -Ev '^/ >|^ -+$'
Roy
Bob
Jenny
Example-9: concatenate the fields in XML
xmllint --xpath 'concat(/results/students/student[1]/name," ",/results/students/student[1]/maths," " ,/results/students/student[1]/physics," ", /results/students/student[1]/chemistry)' sample.xml
Roy 89 91 73
Example-10: Looping over elements of XML and concatenating the fields for each node
for i in {1..3}; do
xmllint --xpath "concat(/results/students/student[${i}]/name,' ',/results/students/student[${i}]/maths,' ' ,/results/students/student[${i}]/physics,' ', /results/students/student[${i}]/chemistry)" sample.xml;
done
Roy 89 91 73
Bob 81 67 70
Jenny 99 99 99
Example-11: selecting an element by its name in XML using contains() function
# The first name element contains 'Roy' substring
#
xmllint --xpath '//results/students/student[contains(name,"Roy")]' sample.xml
<student>
<name>Roy</name>
<maths>89</maths>
<physics>91</physics>
<chemistry>73</chemistry>
</student>
xmllint --xpath '//results/students/student[contains(name,"Jenny")]' sample.xml
<student>
<name>Jenny</name>
<maths>99</maths>
<physics>99</physics>
<chemistry>99</chemistry>
</student>
Example-12: Excluding an element by using not contains()
In the following example, I have excluded all the elements with the name ‘jenny’
xmllint --xpath '//results/students/student[not (contains(name,"Jenny"))]' sample.xml
<student>
<name>Roy</name>
<maths>89</maths>
<physics>91</physics>
<chemistry>73</chemistry>
</student>
<student>
<name>Bob</name>
<maths>81</maths>
<physics>67</physics>
<chemistry>70</chemistry>
</student>
Example-13: Extracting the tag value
Here is a new input XML file, where we have a list of tools aggregated by the type(Eg: editor, browser, photo editor, etc), Notice the type of the tool is provided within the tag.
<tools>
<tool type="editor">
<name>vim</name>
<name>nano</name>
</tool>
<tool type="browser">
<name>chrome</name>
<name>firefox</name>
<name>brave</name>
</tool>
<tool type="photoeditor">
<name>photoshop</name>
<name>gimp</name>
</tool>
</tools>
xmllint --xpath '//tools/tool/@type' sample2.xml
type="editor"
type="browser"
type="photoeditor"
Example-14: Finding a tag or string in a big XML file
For example, if you want to find the XML path where details of a student called “Roy” are located. You can use the --shell
flag and then issue the grep
command to get the path of that particular tag. Notice in the below example that the index is [1] for Roy, [2] for Bob, and [3] for Jenny. This option may help you easily find the location of a tag deep down in a complex XML file.
xmllint --shell sample.xml
/ >
/ > grep Roy
/results/students/student[1]/name : ta- 3 Roy
/ > cat /results/students/student[1]/*
-------
<name>Roy</name>
-------
<maths>89</maths>
-------
<physics>91</physics>
-------
<chemistry>73</chemistry>
/ >
> grep Bob
/results/students/student[2]/name : ta- 3 Bob
/ >
/ > grep Jenny
/results/students/student[3]/name : ta- 5 Jenny
/ >