The most readable way in XPath to write "is the value of X a member of the sequence S"?

XPath 2.0 has several new functions and syntax for 1.0 that work with sequences. Some of them really do not add what the language could already do in 1.0 (with node sets), but they make it easier to express the desired logic in ways that are more readable. This increases the likelihood that the programmer will receive the code correctly - and save it that way. For instance,

  • empty(s) equivalent to not(s) , but its intent is much clearer if you want to check if the sequence is empty.
    • Correction : the effective logical value of a sequence is generally more complicated. For instance. empty((0)) ! = not((0)) . This also applies to exists(s) vs. s in a boolean context. However, there are regions of s where empty(s) equivalent to not(s) , so these two options can be used interchangeably in these domains. But this shows that using empty() can make a non-trivial difference in simplifying code understanding.
  • Similarly, exists(s) equivalent to boolean(s) , which already existed in XPath 1.0 (or just s in a boolean context), but again much clearer about intent.
  • Quantitative expressions ; for example, " some $x in satisfies test ($x) expression" would be equivalent to boolean( [ test (.)]) (although the new syntax is more flexible since you don’t have to worry about losing context because you have a variable referencing her).
  • Similarly, the expression every $x in satisfies test ($x) "will be equivalent to the expression not( [not( test (.))]) , But more readable.

These features and syntax were clearly added for a small fee, solely to serve the purpose of writing XPath, which is easier to match with how people think. This means that experienced developers know that comprehensible code far exceeds code that is difficult to understand.

Given all of this ... that would be a clear and understandable way to write an XPath test expression that asks

Does the value of X in the sequence S mean?

Some ways to do this: (Note: I used the notation X and s here to indicate the meaning and sequence, but I do not want to imply that these subexpressions are tests of element names, or that they are simple expressions. They can be complex.)

  • X = S : That would be one of the most unreadable, because for
    • think about which of X and S are sequences compared to single values
    • understand general comparisons that are not obvious from the syntax
      • However, one of the advantages of this form is that it allows us to put topic (X) before the comment ("is member S"), which I think helps in readability.
      • See also A good CMS point regarding readability when syntax or names make the "power" of X and S obvious.
  • index-of(S, X) : It is clear about what is intended as a value and what as a sequence (if you remember the order of the index-of() arguments). But it expresses more than we need: it queries the index when we really want to know if X is in S. This is somewhat misleading to the reader. An experienced developer will determine what it is, with some effort and with an understanding of the context. But the more we rely on the context to understand the intent of each line, the more understanding of the code becomes circular (spiral) and potentially Sisyphean task! In addition, since index-of() designed to return a list of all indexes of occurrences of X, it may be more expensive than necessary: ​​an intelligent processor would not have to find the entire contents of S in order to evaluate X = S and not list them by order; but for index-of(S, X) , the correct order must be defined, and all contents of S must be compared with X. Another disadvantage of using index-of() is that it is limited to using eq for comparison; you cannot, for example, use it to find out if a node is identical to any node in a given sequence.
    • Correction: This form, used as a conditional test, may lead to a runtime error: Effective boolean value is not defined for a sequence of two or more items starting with a numeric value . (But at least we won't get the wrong booleans, since index-of() cannot return zero.) If S can have multiple instances of X, this is another good reason to prefer form 3 or 6.
  • exists(index-of(X, S)) : makes the goal clearer and helps the processor eliminate the penalty for performance if the processor is smart enough.
  • some $m in S satisfies $m eq X : This is very clear and exactly in line with our intent. This seems long compared to 1, which in itself can reduce readability. But perhaps this is an affordable price for clarity. Keep in mind that X and S can potentially be complex expressions on their own - they are not necessarily just references to variables. The advantage is that since the eq operator is explicit, you can replace it with is or any other comparison operator.
  • S[. eq X] S[. eq X] : clearer than 1, but shares the semantic flaws 2: it calculates all S members that are equal to X. In fact, this can return false negative (incorrect effective logical value) if X is false, For example. (0, 1)[. eq 0] (0, 1)[. eq 0] returns 0, which is false, although 0 occurs in (0, 1) .
  • exists(S[. eq X]) : Clearer than 1, 2, 3, and 5. Not as clear as 4, but shorter. Avoids flaws 5 (or at least most of them, depending on the characteristics of the processor).

I am inclined to the latter, at the moment: exists(S[. eq X])

How about you ... How does a developer approach a complex, unfamiliar XSLT or XQuery or other program that uses XPath 2.0 and wants to find out what this program does, which would be easiest for you to read?

Sorry for the long question. Thanks for reading this.

Edit: I changed = to eq wherever possible in the discussion above to make it easier to see where the value comparison was intended (as opposed to the general comparison).

+4
source share
4 answers

For what it's worth, if the names or contexts clearly show that X is single, I am glad to use your first form, X = S - for example, when I want to check the attribute value for a set of possible values:

 <xsl:when test="@type = ('A', 'A+', 'A-', 'B+')" /> 

or

 <xsl:when test="@type = $magic-types"/> 

If I think there is a risk of confusion, I like your sixth wording. The less often I have to remember the rules for calculating an effective logical value, the less often I make mistakes with them.

+3
source

I prefer this one :

 count(distinct-values($seq)) eq count(distinct-values(($x, $seq))) 

When $ x itself is a sequence, this expression implements a subset (value-based) of the relationship between two sets of values , which are represented as sequences. This implementation of the subset has only linear temporal complexity - compared with other ways of expressing it, which have O (N ^ 2)) time complexity.

To summarize , the question of whether one value belongs to a set of values ​​is a special case of the question of whether one set of values ​​is a subset of another. If we have a good implementation of the latter, we can simply use it to answer the former.

+2
source

the functx library has a nice implementation of this function, so you can use

 functx:is-node-in-sequence($X, $Y) 

(this specific function can be found at http://www.xqueryfunctions.com/xq/functx_is-node-in-sequence.html )

The entire functx library is available for XQuery ( http://www.xqueryfunctions.com/ ) and XSLT ( http://www.xsltfunctions.com/ )

Marklogic ships the functx library with its core product; other suppliers may also.

+2
source

Another possibility, when you want to know if node X exists in sequence S,

 exists((X) intersect S) 

I think it is quite readable and concise. But it only works when X and the values ​​in S are nodes; if you try to ask

 exists(('bob') intersect ('alice', 'bob')) 

you will get a runtime error. In the program I'm working on right now, I need to compare strings, so this is not an option.

As Dimitri notes, the appearance of a node in a sequence is a matter of identity, not a comparison of values.

0
source

Source: https://habr.com/ru/post/1469346/


All Articles