Perl: how to parse xml file sequentially

I have an XML file that describes a data structure that I can exchange for a UDP channel. For example: Here is my input XML file describing my data structure.

<ds> <uint32 name='a'/> <uint32 name='b'/> <string name='c'/> <int16 name='d'/> <uint32 name='e'/> </ds> 

Parsing this XML file using Perl XML: Simple, allows me to generate the following hash

 $VAR1 = { 'uint32' => { 'e' => {}, 'a' => {}, 'b' => {} }, 'int16' => { 'name' => 'd' }, 'string' => { 'name' => 'c' } }; 

As you can see, after the parsing, I cannot understand what the relative position of the 'e' field will be relative to the beginning of the data structure.

I would like to find the offsets of each of these elements.

I tried to find an XML perl parser that allows me to parse an XML file sequentially, something like the getnexttag () function, but cannot find.

What is the best way to do this programmatically? If not perl, then what other language is best suited to do the job?

+4
source share
3 answers

You will need to use a stream analyzer with appropriate callbacks, it will also improve the parsing speed (and reduce memory consumption if everything is done correctly) when it comes to larger datasets, which is a good / strong> awesome.

I recommend that you use XML::SAX , an introduction to the module is available at the following link:

Provide callbacks for start_element so you can read the value of each element one at a time.


Could you write me a simple example?

Yes, and I already have it! ; -)

In the fragment below, the provided OP data will be analyzed and the name of each element will be printed, as well as the key / value of the attributes.

This should be fairly easy to understand, but if you have any questions, feel free to add them as comments, and I will update this post with more details.

 use warnings; use strict; use XML::SAX; my $parser = XML::SAX::ParserFactory->parser( Handler => ExampleHandler->new ); $parser->parse_string (<<EOT <ds> <uint32 name='a'/> <uint32 name='b'/> <string name='c'/> <int16 name='d'/> <uint32 name='e'/> </ds> EOT ); # # # # # # # # # # # # # # # # # # # # # # # # package ExampleHandler; use base ('XML::SAX::Base'); sub start_element { my ($self, $el) = @_; print "found element: ", $el->{Name}, "\n"; for my $attr (values %{$el->{Attributes}}) { print " '", $attr->{Name}, "' = '", $attr->{Value}, "'\n"; } print "\n"; } 

Output

 found element: ds found element: uint32 'name' = 'a' found element: uint32 'name' = 'b' found element: string 'name' = 'c' found element: int16 'name' = 'd' found element: uint32 'name' = 'e' 

I am not satisfied with XML :: SAX, are there any other modules?

Yes, there is a choice. Read the following list and choose the one that suits your specific problem:


What is the difference between different parsing methods?

I also recommend reading the following XML parsing FAQs. It will generate Pro and Con using a parser tree (e.g. XML :: Parser :: Simple) or a stream analyzer:

+3
source

This is certainly possible with Perl.

Here is an example with XML::LibXML :

 use strict; use warnings; use feature 'say'; use XML::LibXML; my $xml = XML::LibXML->load_xml( location => 'test.xml' ); my ( $dsNode ) = $xml->findnodes( '/ds' ); my @kids = $dsNode->nonBlankChildNodes; # The indices of this array will # give the offset my $first_kid = shift @kids; # Pull off the first kid say $first_kid->toString; # "<uint32 name='a'/>" my $second = $first_kid->nextNonBlankSibling(); my $third = $second->nextNonBlankSibling(); say $third->toString; # "<string name="c"/>" 
+2
source

Here is an example using XML::Twig

 use XML::Twig; XML::Twig->new( twig_handlers => { 'ds/*' => \&each_child } ) ->parse( $your_xml_data ); sub each_child { my ($twig, $child) = @_; printf "tag %s : name = %s\n", $child->name, $child->{att}->{name}; } 

It is output:

 tag uint32 : name = a tag uint32 : name = b tag string : name = c tag int16 : name = d tag uint32 : name = e 
+1
source

Source: https://habr.com/ru/post/1387896/


All Articles