How to parse XML using shellscript?

I would like to know what would be the best way to parse an XML file using shellscript?

  • Should this be done manually?
  • Is there a third level library?

If you already did this, if you could tell me how you managed to do it

+48
linux bash shell
Jan 13 '11 at
source share
11 answers

You can try xmllint

The xmllint program parses one or more XML files specified in the command line as xmlfile. It prints various types of products, depending on the options selected. This is useful for detecting errors in both the XML code and the XML parser. Itse

It allows you to select elements in an xml document by xpath using the -pattern option.

On Mac OS X (Yosemite), it is installed by default.
On Ubuntu, if it is not already installed, you can run apt-get install libxml2-utils

+64
Jan 13 '11 at 17:27
source share
— -

Here is a complete working example.
If it only retrieves email addresses, you can simply do something like:
1) Assume the spam.xml XML file is similar to

 <spam> <victims> <victim> <name>The Pope</name> <email>pope@vatican.gob.va</email> <is_satan>0</is_satan> </victim> <victim> <name>George Bush</name> <email>father@nwo.com</email> <is_satan>1</is_satan> </victim> <victim> <name>George Bush Jr</name> <email>son@nwo.com</email> <is_satan>0</is_satan> </victim> </victims> </spam> 

2) You can receive emails and process them with this short bash code:

 #!/bin/bash emails=($(grep -oP '(?<=email>)[^<]+' "/my_path/spam.xml")) for i in ${!emails[*]} do echo "$i" "${emails[$i]}" # instead of echo use the values to send emails, etc done 

The result of this example:

 0 pope@vatican.gob.va 1 father@nwo.com 2 son@nwo.com 

Important Note:
Do not use this for serious questions. This is fine for playing games, getting fast results, learning grep, etc., but you definitely need to search, learn and use the XML parser for production (see Micha's comment below).

+16
Jun 06 '14 at 18:01
source share

There is also xmlstarlet (which is also available for Windows).

http://xmlstar.sourceforge.net/doc/xmlstarlet.txt

+11
Jan 13 '11 at 15:57
source share

I am surprised that no one mentioned xmlsh . Mission Statement:

Command line for XML. Based on the philosophy and design of Unix Shells

xmlsh provides a familiar scripting environment, but specifically designed for scripting xml processes.

A list of shell-like commands is provided here .

I use the xed command a lot, which is equivalent to sed for XML and allows XPath to search and replace.

+10
Jan 31 '13 at 7:27
source share

Try sgrep . It is not clear what exactly you are trying to do, but of course I would not write an XML parser in bash.

+7
Jan 13 '11 at 12:46 on
source share

Do you have xml_grep installed? This is the standard perl-based utility for some distributions (it was pre-installed on my CentOS system). Instead of giving it a regular expression, you give it an xpath expression.

+7
Jan 13 '11 at 17:05
source share

A fairly new project is the xml-coreutils package containing xml-cat, xml-cp, xml-cut, xml-grep, ...

http://xml-coreutils.sourceforge.net/contents.html

+4
Jan 13 '11 at 18:29
source share

Try using xpath. You can use it to analyze elements from the xml tree.

http://www.ibm.com/developerworks/xml/library/x-tipclp/index.html

+3
Feb 21 '12 at 20:18
source share

This really goes beyond the scope of the shell script. Shell script and standard Unix tools are suitable for parsing file-oriented strings, but things change when you talk about XML. Even simple tags can be a problem:

 <MYTAG>Data</MYTAG> <MYTAG> Data </MYTAG> <MYTAG param="value">Data</MYTAG> <MYTAG><ANOTHER_TAG>Data </ANOTHER_TAG><MYTAG> 

Imagine you are trying to write a shell script that can read nested data. Three very, very simple XML examples show different ways this can be a problem. The first two examples are the same syntax in XML. The third simply has an attribute attached to it. The fourth contains data in another tag. The simple sed , awk and grep commands cannot catch all the possibilities.

You need to use a full-blown scripting language such as Perl, Python, or Ruby. Each of them has modules that can parse XML data and facilitate access to the underlying structure. I am using XML :: Simple in Perl. It took me a few attempts to figure this out, but he did what I needed and made programming easier for me.

+2
Jan 13 2018-11-11T00:
source share

Here's a function that converts pairs and attributes of the XML name and name to bash variables.

http://www.humbug.in/2010/parse-simple-xml-files-using-bash-extract-name-value-pairs-and-attributes/

+1
Jan 17 '11 at 3:12
source share

Here's a solution using xml_grep (because xpath was not part of our distribution, and I did not want to add it to all production machines) ...

If you are looking for a specific parameter in an XML file, and if all the elements at a given level of the tree are unique and there are no attributes, you can use this convenient function:

 # File to be parsed xmlFile="xxxxxxx" # use xml_grep to find settings in an XML file # Input ($1): path to setting function getXmlSetting() { # Filter out the element name for parsing local element=`echo $1 | sed 's/^.*\///'` # Verify the element is not empty local check=${element:?getXmlSetting invalid input: $1} # Parse out the CDATA from the XML element # 1) Find the element (xml_grep) # 2) Remove newlines (tr -d \n) # 3) Extract CDATA by looking for *element> CDATA <element* # 4) Remove leading and trailing spaces local getXmlSettingResult=`xml_grep --cond $1 $xmlFile 2>/dev/null | tr -d '\n' | sed -n -e "s/.*$element>[[:space:]]*\([^[:space:]].*[^[:space:]]\)[[:space:]]*<\/$element.*/\1/p"` # Return the result echo $getXmlSettingResult } #EXAMPLE logPath=`getXmlSetting //config/logs/path` check=${logPath:?"XML file missing //config/logs/path"} 

This will work with this structure:

 <config> <logs> <path>/path/to/logs</path> <logs> </config> 

It will also work with this (but it will not contain newlines):

 <config> <logs> <path> /path/to/logs </path> <logs> </config> 

If you have duplicate <config> or <logs> or <path>, then it will only return the last one. You can probably change the function to return an array if it finds multiple matches.

FYI: This code works on RedHat 6.3 with GNU BASH 4.1.2, but I don't think I'm doing anything special, so it should work everywhere.

NOTE. For someone new to scripting, make sure you use the correct types of quotation marks, all three are used in this code (regular single quote = alphabetic, reverse single quote = execution and double quote = group).

+1
Nov 27 '12 at
source share



All Articles