The real problem is that the first thing trying to do is XML::Simple , is to take the XML and represent it as a perl data structure.
As you undoubtedly know from perldata two available data structures: hash and array .
- Arrays are ordered scalars.
- hashes are unordered key-value pairs.
And XML is not working. It has elements that:
- does not have a unique name (which means that hashes do not fit).
- .... but "ordered" inside the file.
- may have attributes (which you could insert into the hash)
- may have content (but cannot, but may be a unary tag)
- can have children (any depth)
And these things do not map directly to the available perl data structures - at a simplified level, a nested hash of a hash may occur, but it cannot handle elements with duplicate names. Also, you cannot easily distinguish between attributes and child nodes.
So, XML::Simple tries to guess based on the XML content and accepts โhintsโ from various parameter parameters, and then when you try to output the content, it (tries) to apply the same process in the reverse order.
As a result, for anything but the simplest XML, at best it becomes cumbersome or loses data in the worst case.
Consider:
<xml> <parent> <child att="some_att">content</child> </parent> <another_node> <another_child some_att="a value" /> <another_child different_att="different_value">more content</another_child> </another_node> </xml>
This - when parsing through XML::Simple gives you:
$VAR1 = { 'parent' => { 'child' => { 'att' => 'some_att', 'content' => 'content' } }, 'another_node' => { 'another_child' => [ { 'some_att' => 'a value' }, { 'different_att' => 'different_value', 'content' => 'more content' } ] } };
Note. Now you have under parent - only anonymous hashes, but under another_node you have an array of anonymous hashes.
So, to access the contents of child :
my $child = $xml -> {parent} -> {child} -> {content};
Note that you have a โchildโ node, below it is a โcontentโ node, which is not because it is ... content.
But to access the content under the first element of another_child :
my $another_child = $xml -> {another_node} -> {another_child} -> [0] -> {content};
Please note that - due to the presence of several <another_node> elements, XML was parsed into an array where it was not with one. (If you have an element called content under it, then you end up with something else). You can change this using ForceArray , but then you get a hash of the hashes of the arrays of the arrays of the hashes of the arrays - although this is at least consistent in handling child elements. Edit: Note that after the discussion, this is a bad default, not an error with XML :: Simple.
You must install:
ForceArray => 1, KeyAttr => [], ForceContent => 1
If you apply this to XML as described above, you will get instead:
$VAR1 = { 'another_node' => [ { 'another_child' => [ { 'some_att' => 'a value' }, { 'different_att' => 'different_value', 'content' => 'more content' } ] } ], 'parent' => [ { 'child' => [ { 'att' => 'some_att', 'content' => 'content' } ] } ] };
This will give you consistency because you will no longer have separate node elements that handle differently with multi-node.
But you still:
- You have 5 reference deep trees to get value.
For example:
print $xml -> {parent} -> [0] -> {child} -> [0] -> {content};
You still have content and child hash elements processed as if they were attributes, and since the hashes are unordered, you simply cannot restore the input. So basically, you need to parse it and then run through Dumper to find out where you need to look.
But with the xpath request, you will get node with:
findnodes("/xml/parent/child");
What you don't get in XML::Simple , which you do in XML::Twig (and I assume XML::LibXML , but I know it less well):
xpath support. xpath is an XML way of expressing the path to a node. So you can โfindโ the node in the above example with get_xpath('//child') . You can even use attributes in xpath - like get_xpath('//another_child[@different_att]') , which will select exactly the one you want. (You can also iterate over matches).cut and paste to move items aroundparsefile_inplace so you can change the XML with editing.pretty_print to format XML .twig_handlers and purge - which allows you to process really large XML without having to load it all into memory.simplify if you really have to make it backward compatible with XML::Simple .- code is usually simpler than trying to follow chains of references to hashes and arrays, which can never be executed sequentially due to fundamental differences in structure.
It is also widely available - it downloads easily from CPAN and is distributed as an installable package on many operating systems. (Unfortunately, this is not a standard installation.)
See: XML :: Quick Link
For comparison:
my $xml = XMLin( \*DATA, ForceArray => 1, KeyAttr => [], ForceContent => 1 ); print Dumper $xml; print $xml ->{parent}->[0]->{child}->[0]->{content};
Vs.
my $twig = XML::Twig->parse( \*DATA ); print $twig ->get_xpath( '/xml/parent/child', 0 )->text; print $twig ->root->first_child('parent')->first_child_text('child');