I have a bunch of XML files about 1-2 megabytes in size. In fact, more than a bunch, there are millions. They are all well-formed, and many are even argued against their scheme (confirmed by libxml2).
All were created by the same application, so they are in a consistent format (although this may theoretically change in the future).
I want to check the values โโof one element in each file from a Perl script. Speed โโis important (I would like to take less than a second per file), and, as already noted, I already know that the files are well-formed.
I'm really sorry to just โopenโ the files in Perl and check until I see the item I'm looking for, take the value (which is near the beginning of the file) and close the file.
On the other hand, I could use an XML parser (which could protect me from future XML formatting changes), but I suspect it will be slower than I would like.
Can you recommend a suitable approach and / or parser?
Thanks in advance.
Update
Here is the structure / complexity of the data I'm trying to pull:
<doc>
...
<someparentnode attrib="notme" attrib2="5">
<node>Not this one</node>
</someparentnode>
<someparentnode attrib="pickme" attrib2="5">
<node>This is the data I want</node>
</someparentnode>
<someparentnode attrib="notme"
attrib2="reallyreallylonglineslikethisonearewrapped">
<node>Not this one either and it may be
wrapped too.</node>
</someparentnode>
...
</doc>
The hierarchy goes several levels deeper than this, but I think it covers what I am trying to do.
source
share