Extracting values from XML on bash

Most modern applications use JSON and YAML for communication and configuration. However, there are many applications still using XML. If for some reason, you have to work with an XML file over bash, this post will cover many examples to extract the values from an XML file.

Note that there are other alternatives to xmllint for XML extraction like xmlstarlet, xidel, etc. However, this post will be limited to using xmllint. Also, note that we are often tempted to use tools like awk, sed, or grep to extract the XML value using REGEX. However, it’s a dangerous bet, and any schema change may lead to disaster. Consider using an XML-aware tool like xmllint.

Example Input XML file:

<?xml version="1.0"?>
<results>
  <students>
    <student>
      <name>Roy</name>
      <maths>89</maths>
      <physics>91</physics>
      <chemistry>73</chemistry>
    </student>
    <student>
      <name>Bob</name>
      <maths>81</maths>
      <physics>67</physics>
      <chemistry>70</chemistry>
    </student>
    <student>
      <name>Jenny</name>
      <maths>99</maths>
      <physics>99</physics>
      <chemistry>99</chemistry>
    </student>
  </students>
</results>


An equivalent JSON for the example JSON(only for visualization)

{
	"results": {
		"students": {
			"student": [
				{
					"name": "Roy",
					"maths": "89",
					"physics": "91",
					"chemistry": "73"
				},
				{
					"name": "Bob",
					"maths": "81",
					"physics": "67",
					"chemistry": "70"
				},
				{
					"name": "Jenny",
					"maths": "99",
					"physics": "99",
					"chemistry": "99"
				}
			]
		}
	}
}


NOTE: If the XPath flag is not supported by your xmllint


A few distros or some older distributions do not support all the features when supplied via the flags to the xmllint Command line. For example, XPath querying is one of the most sought-after features of xmllint. The following command may not work on all the distros if the --xpath flag is not supported in your distro. However, you do not have to dishearten; there is a secondary way to use the XPath query. If you are new to xmllint, then start assuming you have all the capabilities; I have provided a secondary workaround for a small set of people that do not have xmllint with all the capabilities enabled on the command line.


The first method uses command line args:(available in all modern distros)
xmllint  --xpath '//results/*' sample.xml 

The second method uses xmllint shell and stdin(the workaround for old servers)
echo "cat //results/*" |xmllint  --shell sample.xml  |grep -v '^/ >'


Examples:

Example-1: Formatting an XML
xmllint  --format sample.xml 
<?xml version="1.0"?>
<results>
  <students>
    <student>
      <name>Roy</name>
      <maths>89</maths>
      <physics>91</physics>
      <chemistry>73</chemistry>
    </student>
    <student>
      <name>Bob</name>
      <maths>81</maths>
      <physics>67</physics>
      <chemistry>70</chemistry>
    </student>
    <student>
      <name>Jenny</name>
      <maths>99</maths>
      <physics>99</physics>
      <chemistry>99</chemistry>
    </student>
  </students>
</results>

Example-2: Pretty printing the XML
--pretty STYLE : pretty-print in a particular style
	                 0 Do not pretty print
	                 1 Format the XML content, as --format
	                 2 Add whitespace inside tags, preserving content
xmllint  --pretty 0 sample2.xml 
<?xml version="1.0"?>
<tools>
	<tool type="editor">
		<name>vim</name>
		<name>nano</name>
	</tool>
	<tool type="browser">
		<name>chrome</name>
		<name>firefox</name>
		<name>brave</name>
	</tool>
	<tool type="photoeditor">
		<name>photoshop</name>
		<name>gimp</name>
	</tool>
</tools>

Example-3: Remove all the new lines and spaces
xmllint --noblanks sample.xml 
<?xml version="1.0"?>
<results><students><student><name>Roy</name><maths>89</maths><physics>91</physics><chemistry>73</chemistry></student><student><name>Bob</name><maths>81</maths><physics>67</physics><chemistry>70</chemistry></student><student><name>Jenny</name><maths>99</maths><physics>99</physics><chemistry>99</chemistry></student></students></results>

Example-4: Validating the XML


If the input file has valid XML syntax, then the xmllint command will print the file without any error; however, if there is any error, it will highlight the error. For example, in the below example, I have changed a closing tag from “name” to “name1” causing a missing closing tag.

xmllint  sample.xml 
sample.xml:4: parser error : Opening and ending tag mismatch: name line 4 and name1
        <name>Roy</name1>
                         ^

Example-5: Printing the structure of the XML
echo 'du /' |xmllint  --shell sample.xml |grep -v '^/ >'
results
  students
    student
      name
      maths
      physics
      chemistry
    student
      name
      maths
      physics
      chemistry
    student
      name
      maths
      physics
      chemistry

Example-6: Extract the entire XML
xmllint  --xpath '/*' sample.xml 

#or 

echo "cat /*" |xmllint  --shell sample.xml  |grep -v '^/ >'

<results>
<students>
    <student>
        <name>Roy</name>
        <maths>89</maths>
        <physics>91</physics>
        <chemistry>73</chemistry>
    </student>
    <student>
        <name>Bob</name>
        <maths>81</maths>
        <physics>67</physics>
        <chemistry>70</chemistry>
    </student>
    <student>
        <name>Jenny</name>
        <maths>99</maths>
        <physics>99</physics>
        <chemistry>99</chemistry>
    </student>
    </students>
</results>

Example-7: Extract the first element
xmllint  --xpath '//results/students/student[1]' sample.xml

#or

echo 'cat //results/students/student[1]' | xmllint --shell  sample.xml  |grep -Ev '^/ >'
<student>
        <name>Roy</name>
        <maths>89</maths>
        <physics>91</physics>
        <chemistry>73</chemistry>
</student>

Example-8: Extract a value from all the nodes
xmllint  --xpath '//results/students/student[*]/name/text()' sample.xml

#or

echo 'cat //results/students/student[*]/name/text()' | xmllint --shell  sample.xml  |grep -Ev '^/ >|^ -+$'
Roy
Bob
Jenny

Example-9: concatenate the fields in XML
xmllint  --xpath 'concat(/results/students/student[1]/name," ",/results/students/student[1]/maths," " ,/results/students/student[1]/physics," ", /results/students/student[1]/chemistry)' sample.xml
Roy 89 91 73

Example-10: Looping over elements of XML and concatenating the fields for each node
for i in {1..3}; do
    xmllint  --xpath "concat(/results/students/student[${i}]/name,' ',/results/students/student[${i}]/maths,' ' ,/results/students/student[${i}]/physics,' ', /results/students/student[${i}]/chemistry)" sample.xml; 
done
Roy 89 91 73
Bob 81 67 70
Jenny 99 99 99

Example-11: selecting an element by its name in XML using contains() function
# The first name element contains 'Roy' substring
#
xmllint  --xpath '//results/students/student[contains(name,"Roy")]' sample.xml 
<student>
        <name>Roy</name>
        <maths>89</maths>
        <physics>91</physics>
        <chemistry>73</chemistry>
    </student>


xmllint  --xpath '//results/students/student[contains(name,"Jenny")]' sample.xml 
<student>
        <name>Jenny</name>
        <maths>99</maths>
        <physics>99</physics>
        <chemistry>99</chemistry>
    </student>

Example-12: Excluding an element by using not contains()


In the following example, I have excluded all the elements with the name ‘jenny’

xmllint  --xpath '//results/students/student[not (contains(name,"Jenny"))]' sample.xml 
<student>
        <name>Roy</name>
        <maths>89</maths>
        <physics>91</physics>
        <chemistry>73</chemistry>
    </student>
<student>
        <name>Bob</name>
        <maths>81</maths>
        <physics>67</physics>
        <chemistry>70</chemistry>
    </student>

Example-13: Extracting the tag value


Here is a new input XML file, where we have a list of tools aggregated by the type(Eg: editor, browser, photo editor, etc), Notice the type of the tool is provided within the tag.

<tools>
	<tool type="editor">
		<name>vim</name>
		<name>nano</name>
	</tool>
	<tool type="browser">
		<name>chrome</name>
		<name>firefox</name>
		<name>brave</name>
	</tool>
	<tool type="photoeditor">
		<name>photoshop</name>
		<name>gimp</name>
	</tool>
</tools>
xmllint --xpath '//tools/tool/@type' sample2.xml 
 type="editor"
 type="browser"
 type="photoeditor"

Example-14: Finding a tag or string in a big XML file


For example, if you want to find the XML path where details of a student called “Roy” are located. You can use the --shell flag and then issue the grep command to get the path of that particular tag. Notice in the below example that the index is [1] for Roy, [2] for Bob, and [3] for Jenny. This option may help you easily find the location of a tag deep down in a complex XML file.

xmllint  --shell sample.xml
/ > 
/ > grep Roy
/results/students/student[1]/name : ta-        3 Roy
/ > cat /results/students/student[1]/*
 -------
<name>Roy</name>
 -------
<maths>89</maths>
 -------
<physics>91</physics>
 -------
<chemistry>73</chemistry>
/ > 




 > grep Bob
/results/students/student[2]/name : ta-        3 Bob
/ > 
/ > grep Jenny
/results/students/student[3]/name : ta-        5 Jenny
/ > 

Leave a Comment

Your email address will not be published.

Scroll to Top