This is an interesting question. Most of the problems have already been mentioned in @lwburk's answer and in his comments. Just to open up a little harder, hidden in this question for the casual reader, my answer is probably more complex or more verbose than the OP requires.
XPath 1.0 Features Related to This Problem
In XPath, each step and each node in a set of selected nodes works independently. It means that
- a subexpression does not have a general way of accessing the data that was calculated in the previous subexpression, or to exchange the data calculated in this subexpression with other subexpressions
- a node has no general way to refer to a node that was used as the node context in the previous subexpression
- a node has no general way to refer to other selected nodes.
- if all selected nodes must be compared with the same specific node, then the node must be uniquely defined in a way that is common to all selected nodes
(Well, actually, I'm not 100% sure if this list is absolutely correct in every case. If someone knows the XPath quirks better, comment on or correct this answer by editing it.)
Despite the lack of common solutions, some of these limitations can be overcome if there is proper knowledge of the structure of the document, and / or the previously used axis can be βreturnedβ by another axis, which serves as a backlink, i.e. matches only the nodes that were used in the context of the node in the previous expression. A common example of this is the use of the parent axis after the first use of the child axis (the opposite case, from child to parent, is not uniquely reversible without additional information). In such cases, the information from the previous steps is more accurately recreated at a later stage (instead of access to previously known information).
Unfortunately, in this case, I could not come up with any other solution for referencing previously known nodes, except for using XPath variables (which must be defined in advance).
XPath specifies the syntax for referencing a variable, but does not specify the syntax for defining variables; the way that variables are defined depends on the environment in which XPath is used. In fact, since the recommendation says that "the variable bindings used to evaluate a subexpression are always the same as those used to evaluate a containing expression," you can also argue that XPath explicitly prohibits the definition of variables inside an XPath expression.
Reformulated problem
In your question, the problem would be, when specifying <dt> identify the following <dd> elements or the originally specified node after switching the context of the node. The identification of the originally specified <dt> is critical because for each node in the node-set to be filtered, the predicate expression is evaluated using node as the context of the node; therefore, you cannot reference the original <dt> in a predicate unless there is a way to identify it after changing the context. The same applies to the <dd> elements that follow the siblings of a given <dt> .
If you use variables, it would be possible to discuss whether there is a significant difference between 1) using XPath variable syntax and Nokogiri's specific way to declare this variable, or 2) using Nokogiri's Extended Xath syntax, which allows you to use Ruby variables in XPath Expression. In both cases, a variable is defined in its own way, and the meaning of XPath is clear only if a variable definition is available. A similar case can be seen with XSLT, where in some cases you can choose between 1) defining a variable with <xsl:variable> before using your XPath expression or 2) using current() (inside an XPath expression), which is an XSLT extension .
Solution using variable nodes and the Kaysan method
You can select all <dd> elements following the current <dt> element using the following-sibling::dd (set A). You can also select all <dd> elements following the next <dt> element using the following-sibling::dt[1]/following-sibling::dd (set B). Now the given difference A\B leaves the <dd> elements that you really wanted (elements that are in set A but not in set B). If the variable $setA contains nodes A, and the variable $setB contains nodes B, then the difference in the set can be obtained using (modification) of the Kaisan technique:
dds = $setA[count(.|$setB) != count($setB)]
The simplest workaround without any variables
Currently, your method is to select all the <dt> elements, and then try to associate the value of each such element with the values ββof the corresponding <dd> elements in one operation. Could this logic of communication be transformed in the other direction? Therefore, you must first select all the <dd> elements, and then for each <dd> find the corresponding <dt> . This means that you access the same <dt> elements several times, and with each operation you add only one new <dd> value. This can affect performance, and Ruby code can be more complex.
The good side is the simplicity of the required XPath. Given a <dd> element, finding the appropriate <dt> is surprisingly simple: preceding-sibling::dt[1]
According to your current Ruby code
dl.xpath('dd').each do |dd| dt = dd.xpath("preceding-sibling::dt[1]")