Xpath request when changing XML format

I have a number of variable types, such as:

abc1A, abc1B, abc3B, ... xyz1A, xyz2A, xyz3C, ... data1C, data2A, ... 

It is saved in various xml formats:

 <area name="DataMap"> <int name="number" nullable="true"> <case var="abc2,abc3,abc5">11</case> <case var="abc4,abc6*">8</case> <case var="data1,xyz7,xyz8">22</case> <case var="data3A,xyz{9},xyz{5A,5B,5C}">24</case> <case var="xyz{6,4A,4B,4C}">20</case> <case var="other01">15</case> </int> </area> 

I hope to request to which instance, for example, xyz5A, for example, to cards. The request should return 24, but I do not know in advance if its reference in the xml node is explicit, as in "xyz4A", or using a wildcard such as "xyz4 *", or in curly brackets, as indicated above.

These are the queries for the rows in this row and will successfully return the result:

 xpath '/area[@name="DataMap"]/int[@name="number"]/case[contains(@var,"xyz")][contains(@var,"5A")]' 

But it also returns a hit for data5A, which is not wrong:

 xpath '/area[@name="DataMap"]/int[@name="number"]/case[contains(@var,"data")][contains(@var,"5A")]' 

Are there xpath / other query constructors that parse inconsistent (but valid) xml above? It seems I can only request explicit string matches compared to wildcard and curly formatted formats.

+2
source share
2 answers

While in bash/perl , you are probably tied to libxml . libxml does not support XPath 2.0. There are many questions about SO about XPath / XSLT 2.0 with libxml / libxslt and Perl.

XPath 1.0 has many (small, I must admit) string functions , and you can try to put them together. I experimented a bit, and I also did not like the result, and I was not able to cover all possible cases. You would have ugly constructions like:

 ... or (contains(@var, ',xyz{') and contains(substring-before(substring-after(@var, ',xyz{'), '}'), '5A') and (contains(substring-before(substring-after(@var, ',xyz{'), '}'), ',5A,') or starts-with(substring-after(@var, ',xyz{'), '5A,') or starts-with(substring-after(@var, ',xyz{'), '5A}') or substring-after(substring-before(substring-after(@var, ',xyz{'), '}'), ',5A') = '')) or ... 

And then you will understand that the substring-* functions work with the first occurrence of the match string, and you need even more and and or layers to handle cases like yours:

 <case var="data3A,xyz{9},xyz{5A,5B,5C}">24</case> 

where there are several xyz{ , and the one you need is not known as the first.

I think this is the case when you forgot that you have XML, and just do what Perl is good for it, and treat it like text . As much as I like the XML tools for XML processing and data retrieval, you will probably be better off working with regular expressions and string manipulations in the language that was designed for it.

+1
source

I think the smartest thing would be to iterate over all the variables and programmatically find matches, rather than asking XPath to do this.

Without this, I have at least a few thoughts on braces; Unfortunately, they probably do not help much in the matter * .

It seems that there are perl XPath implementations where you could write .../case[@var =~ /some_regex/] , maybe .../case["xyz4A" =~ to_regex(@var)] and maybe , even .../case[explode_braces(@var) =~ /(^|,)xyz4A(,|$)/] (with a suitable explode_braces ), of course). See http://www.perlmonks.org/?node_id=831612 , for example. I would expect explode_braces to work much, much easier than the first alternative - and I use quite a lot of regular expressions. Again, you seem to be using bash-regexes, and converting them to a perl-regex should also be relatively simple, so if the second idea works, you might be good to go.

If that doesn't work, maybe pin it in your XML syntax or right in front of it and fix this awful XML project by expanding the curly braces?

 $input =~ s/\bvar="([^"]*)"}/'var="'+explode_braces($2)+'"'/eg; 

(Or something very similar, sorry, I have not written much perl in recent years. It is also assumed that your xml uses only one type of attribute, but this should be easy to fix, and that the only place where var=" is in these attributes, which can be a much tougher constraint.)

0
source

Source: https://habr.com/ru/post/1203876/


All Articles