How to get xpath from an XmlNode instance

Question

How to get xpath from an XmlNode instance

Can someone provide the code that will receive the xpath of the System.Xml.XmlNode instance?

Thank!

+46

c # xml .net .net-2.0

joe Oct 27 '08 at 20:12

source share

14 answers

Correct that there are any number of XPath expressions that result in the same node in the instance document. The simplest way to build an expression that uniquely leads to a specific node is a chain of node tests that use the position of the node in the predicate, for example:

 /node()[0]/node()[2]/node()[6]/node()[1]/node()[2]

Obviously, this expression does not use element names, but if all you are trying to do is find the node in the document, you do not need its name. It also cannot be used to search for attributes (since attributes are not nodes and have no position; you can find them only by name), but it will find all other types of nodes.

To build this expression, you need to write a method that returns the position of the node in its parent child nodes, because the XmlNode does not represent this as a property:

 static int GetNodePosition(XmlNode child) { for (int i=0; i<child.ParentNode.ChildNodes.Count; i++) { if (child.ParentNode.ChildNodes[i] == child) { // tricksy XPath, not starting its positions at 0 like a normal language return i + 1; } } throw new InvalidOperationException("Child node somehow not found in its parent ChildNodes property."); }

(There is probably a more elegant way to do this with LINQ, since the XmlNodeList implements IEnumerable , but I'm going to use what I know here.)

Then you can write a recursive method as follows:

 static string GetXPathToNode(XmlNode node) { if (node.NodeType == XmlNodeType.Attribute) { // attributes have an OwnerElement, not a ParentNode; also they have // to be matched by name, not found by position return String.Format( "{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name ); } if (node.ParentNode == null) { // the only node with no parent is the root node, which has no path return ""; } // the path to a node is the path to its parent, plus "/node()[n]", where // n is its position among its siblings. return String.Format( "{0}/node()[{1}]", GetXPathToNode(node.ParentNode), GetNodePosition(node) ); }

As you can see, I hacked it to find attributes.

John joined his version while I wrote mine. There is something in his code that will make me hot now, and I apologize in advance if it sounds like I'm scolding John. (I'm not sure. I'm pretty sure that the list of what John should learn from me is extremely short.) But I think that the thought I'm going to do is pretty important for those who work with XML to think about.

I suspect that the Jon solution came from what I see many developers do: treating XML documents as trees of elements and attributes. I think this comes largely from developers whose main use of XML is a serialization format, because all of the XML they use is structured this way. You may notice these developers because they use the terms “node” and “element” interchangeably. This forces them to propose solutions that treat all other types of nodes as special cases. (I was one of these guys myself for a very long time.)

This seems like a simplifying assumption while you do this. But this is not so. This complicates the tasks and complicates the code. This forces you to bypass parts of XML technology (for example, the XPath node() function) that are specifically designed to handle all types of nodes together.

There is a red flag in Jon code that forces me to request it in a code review, even if I don’t know what the requirements are, and what GetElementsByTagName . Whenever I see that this method is being used, the question arises: "Why should it be an element?". And the answer is very often "oh, should this code handle text nodes too?"

+22

Robert Rossney Oct 27 '08 at 21:42

source share

I know the old post, but the version that I liked most (with the names) was wrong: When the parent node has nodes with different names, it stops reading the index after it found the first inappropriate node -name.

Here is my fixed version:

 /// <summary> /// Gets the X-Path to a given Node /// </summary> /// <param name="node">The Node to get the X-Path from</param> /// <returns>The X-Path of the Node</returns> public string GetXPathToNode(XmlNode node) { if (node.NodeType == XmlNodeType.Attribute) { // attributes have an OwnerElement, not a ParentNode; also they have // to be matched by name, not found by position return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name); } if (node.ParentNode == null) { // the only node with no parent is the root node, which has no path return ""; } // Get the Index int indexInParent = 1; XmlNode siblingNode = node.PreviousSibling; // Loop thru all Siblings while (siblingNode != null) { // Increase the Index if the Sibling has the same Name if (siblingNode.Name == node.Name) { indexInParent++; } siblingNode = siblingNode.PreviousSibling; } // the path to a node is the path to its parent, plus "/node()[n]", where n is its position among its siblings. return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, indexInParent); }

+5

Roemer Aug 12 '13 at 10:25

source share

My 10p costs a hybrid of Robert and Corey answers. I can only apply for a loan to actually enter additional lines of code.

  private static string GetXPathToNode(XmlNode node) { if (node.NodeType == XmlNodeType.Attribute) { // attributes have an OwnerElement, not a ParentNode; also they have // to be matched by name, not found by position return String.Format( "{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name ); } if (node.ParentNode == null) { // the only node with no parent is the root node, which has no path return ""; } //get the index int iIndex = 1; XmlNode xnIndex = node; while (xnIndex.PreviousSibling != null) { iIndex++; xnIndex = xnIndex.PreviousSibling; } // the path to a node is the path to its parent, plus "/node()[n]", where // n is its position among its siblings. return String.Format( "{0}/node()[{1}]", GetXPathToNode(node.ParentNode), iIndex ); }

+3

James Randle Dec 18 '09 at 1:37

source share

Here is a simple way that I used, worked for me.

  static string GetXpath(XmlNode node) { if (node.Name == "#document") return String.Empty; return GetXpath(node.SelectSingleNode("..")) + "/" + (node.NodeType == XmlNodeType.Attribute ? "@":String.Empty) + node.Name; }

+3

rugg Aug 9 2018-12-12T00:

source share

There is no such thing as "the" xpath for node. For any given node, there can be many xpath expressions that will match it.

Perhaps you can process the tree to create an expression that matches it, taking into account the index of individual elements, etc., but that won't be terribly nice code.

Why do you need this? Could be a better solution.

+2

Jon Skeet Oct 27 '08 at 20:19

source share

If you do this, you will get a Path with host names and positions if you have nodes with the same name: "/ Services [1] / System [1] / Group [1] / Folder [2] / File [2] "

 public string GetXPathToNode(XmlNode node) { if (node.NodeType == XmlNodeType.Attribute) { // attributes have an OwnerElement, not a ParentNode; also they have // to be matched by name, not found by position return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name); } if (node.ParentNode == null) { // the only node with no parent is the root node, which has no path return ""; } //get the index int iIndex = 1; XmlNode xnIndex = node; while (xnIndex.PreviousSibling != null && xnIndex.PreviousSibling.Name == xnIndex.Name) { iIndex++; xnIndex = xnIndex.PreviousSibling; } // the path to a node is the path to its parent, plus "/node()[n]", where // n is its position among its siblings. return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, iIndex); }

+1

René Endress Aug 31 2018-11-11T00:

source share

I found that none of the above actions worked with XDocument , so I wrote my own code to support XDocument and used recursion. I think this code handles several identical nodes better than any other code here, because it first tries to break into the XML path the way it can, and then backs up to build just what is needed. Therefore, if you have /home/white/bob and /home/white/mike , and you want to create /home/white/bob/garage , the code will know how to create it. However, I did not want to mess with predicates or wildcards, so I explicitly forbade them; but it would be easy to add support for them.

 Private Sub NodeItterate(XDoc As XElement, XPath As String) 'get the deepest path Dim nodes As IEnumerable(Of XElement) nodes = XDoc.XPathSelectElements(XPath) 'if it doesn't exist, try the next shallow path If nodes.Count = 0 Then NodeItterate(XDoc, XPath.Substring(0, XPath.LastIndexOf("/"))) 'by this time all the required parent elements will have been constructed Dim ParentPath As String = XPath.Substring(0, XPath.LastIndexOf("/")) Dim ParentNode As XElement = XDoc.XPathSelectElement(ParentPath) Dim NewElementName As String = XPath.Substring(XPath.LastIndexOf("/") + 1, XPath.Length - XPath.LastIndexOf("/") - 1) ParentNode.Add(New XElement(NewElementName)) End If 'if we find there are more than 1 elements at the deepest path we have access to, we can't proceed If nodes.Count > 1 Then Throw New ArgumentOutOfRangeException("There are too many paths that match your expression.") End If 'if there is just one element, we can proceed If nodes.Count = 1 Then 'just proceed End If End Sub Public Sub CreateXPath(ByVal XDoc As XElement, ByVal XPath As String) If XPath.Contains("//") Or XPath.Contains("*") Or XPath.Contains(".") Then Throw New ArgumentException("Can't create a path based on searches, wildcards, or relative paths.") End If If Regex.IsMatch(XPath, "\[\]()@='<>\|") Then Throw New ArgumentException("Can't create a path based on predicates.") End If 'we will process this recursively. NodeItterate(XDoc, XPath) End Sub

+1

cjbarth Sep 27 '11 at 2:18

source share

How about using a class extension ?;) My version (building on others works) uses the syntax name [index] ... with an omited index, the element has no "brothers". The loop to get the index of an element is external in an independent procedure (also an extension of the class).

Just skip the following in any utility class (or in the main program class)

 static public int GetRank( this XmlNode node ) { // return 0 if unique, else return position 1...n in siblings with same name try { if( node is XmlElement ) { int rank = 1; bool alone = true, found = false; foreach( XmlNode n in node.ParentNode.ChildNodes ) if( n.Name == node.Name ) // sibling with same name { if( n.Equals(node) ) { if( ! alone ) return rank; // no need to continue found = true; } else { if( found ) return rank; // no need to continue alone = false; rank++; } } } } catch{} return 0; } static public string GetXPath( this XmlNode node ) { try { if( node is XmlAttribute ) return String.Format( "{0}/@{1}", (node as XmlAttribute).OwnerElement.GetXPath(), node.Name ); if( node is XmlText || node is XmlCDataSection ) return node.ParentNode.GetXPath(); if( node.ParentNode == null ) // the only node with no parent is the root node, which has no path return ""; int rank = node.GetRank(); if( rank == 0 ) return String.Format( "{0}/{1}", node.ParentNode.GetXPath(), node.Name ); else return String.Format( "{0}/{1}[{2}]", node.ParentNode.GetXPath(), node.Name, rank ); } catch{} return ""; }

+1

Plasmabubble Jun 27 '14 at 12:45

source share

I created VBA for Excel to do this for a working project. It infer Xpath tuples and the associated text from an element or attribute. The goal was to allow business analysts to identify and map some xml. Appreciate this is a C # forum, but thought it might be of interest.

 Sub Parse2(oSh As Long, inode As IXMLDOMNode, Optional iXstring As String = "", Optional indexes) Dim chnode As IXMLDOMNode Dim attr As IXMLDOMAttribute Dim oXString As String Dim chld As Long Dim idx As Variant Dim addindex As Boolean chld = 0 idx = 0 addindex = False 'determine the node type: Select Case inode.NodeType Case NODE_ELEMENT If inode.ParentNode.NodeType = NODE_DOCUMENT Then 'This gets the root node name but ignores all the namespace attributes oXString = iXstring & "//" & fp(inode.nodename) Else 'Need to deal with indexing. Where an element has siblings with the same nodeName,it needs to be indexed using [index], eg swapstreams or schedules For Each chnode In inode.ParentNode.ChildNodes If chnode.NodeType = NODE_ELEMENT And chnode.nodename = inode.nodename Then chld = chld + 1 Next chnode If chld > 1 Then '//inode has siblings of the same nodeName, so needs to be indexed 'Lookup the index from the indexes array idx = getIndex(inode.nodename, indexes) addindex = True Else End If 'build the XString oXString = iXstring & "/" & fp(inode.nodename) If addindex Then oXString = oXString & "[" & idx & "]" 'If type is element then check for attributes For Each attr In inode.Attributes 'If the element has attributes then extract the data pair XString + Element.Name, @Attribute.Name=Attribute.Value Call oSheet(oSh, oXString & "/@" & attr.Name, attr.Value) Next attr End If Case NODE_TEXT 'build the XString oXString = iXstring Call oSheet(oSh, oXString, inode.NodeValue) Case NODE_ATTRIBUTE 'Do nothing Case NODE_CDATA_SECTION 'Do nothing Case NODE_COMMENT 'Do nothing Case NODE_DOCUMENT 'Do nothing Case NODE_DOCUMENT_FRAGMENT 'Do nothing Case NODE_DOCUMENT_TYPE 'Do nothing Case NODE_ENTITY 'Do nothing Case NODE_ENTITY_REFERENCE 'Do nothing Case NODE_INVALID 'do nothing Case NODE_NOTATION 'do nothing Case NODE_PROCESSING_INSTRUCTION 'do nothing End Select 'Now call Parser2 on each of inode children. If inode.HasChildNodes Then For Each chnode In inode.ChildNodes Call Parse2(oSh, chnode, oXString, indexes) Next chnode Set chnode = Nothing Else End If End Sub

Controls counting elements using:

 Function getIndex(tag As Variant, indexes) As Variant 'Function to get the latest index for an xml tag from the indexes array 'indexes array is passed from one parser function to the next up and down the tree Dim i As Integer Dim n As Integer If IsArrayEmpty(indexes) Then ReDim indexes(1, 0) indexes(0, 0) = "Tag" indexes(1, 0) = "Index" Else End If For i = 0 To UBound(indexes, 2) If indexes(0, i) = tag Then 'tag found, increment and return the index then exit 'also destroy all recorded tag names BELOW that level indexes(1, i) = indexes(1, i) + 1 getIndex = indexes(1, i) ReDim Preserve indexes(1, i) 'should keep all tags up to i but remove all below it Exit Function Else End If Next i 'tag not found so add the tag with index 1 at the end of the array n = UBound(indexes, 2) ReDim Preserve indexes(1, n + 1) indexes(0, n + 1) = tag indexes(1, n + 1) = 1 getIndex = 1 End Function

+1

Sandy Nov 14 '14 at 21:50

source share

It's even easier

  ''' <summary> ''' Gets the full XPath of a single node. ''' </summary> ''' <param name="node"></param> ''' <returns></returns> ''' <remarks></remarks> Private Function GetXPath(ByVal node As Xml.XmlNode) As String Dim temp As String Dim sibling As Xml.XmlNode Dim previousSiblings As Integer = 1 'I dont want to know that it was a generic document If node.Name = "#document" Then Return "" 'Prime it sibling = node.PreviousSibling 'Perculate up getting the count of all of this node sibling before it. While sibling IsNot Nothing 'Only count if the sibling has the same name as this node If sibling.Name = node.Name Then previousSiblings += 1 End If sibling = sibling.PreviousSibling End While 'Mark this node index, if it has one ' Also mark the index to 1 or the default if it does have a sibling just no previous. temp = node.Name + IIf(previousSiblings > 0 OrElse node.NextSibling IsNot Nothing, "[" + previousSiblings.ToString() + "]", "").ToString() If node.ParentNode IsNot Nothing Then Return GetXPath(node.ParentNode) + "/" + temp End If Return temp End Function

0

Corey Fournier Jun 23 '09 at 15:44

source share

Another solution to your problem might be to “mark” xmlnodes that you want to later identify with a custom attribute:

 var id = _currentNode.OwnerDocument.CreateAttribute("some_id"); id.Value = Guid.NewGuid().ToString(); _currentNode.Attributes.Append(id);

which you can store in a dictionary, for example. And you can later identify the node with the xpath request:

 newOrOldDocument.SelectSingleNode(string.Format("//*[contains(@some_id,'{0}')]", id));

I know this is not a direct answer to your question, but it can help if the reason you want to know the xpath for the node is a way to “reach” the node later after you lost the link to it in the code.

It also fixes problems when the document gets elements added / moved, which can ruin the xpath (or indexes as suggested in other answers).

0

Andrei May 18 '16 at 14:23

source share

  public static string GetFullPath(this XmlNode node) { if (node.ParentNode == null) { return ""; } else { return $"{GetFullPath(node.ParentNode)}\\{node.ParentNode.Name}"; } }

0

Mabrouk MAHDHI Jun 29 '17 at 8:26

source share

I should have done this recently. Only items should be considered. Here is what I came up with:

  private string GetPath(XmlElement el) { List<string> pathList = new List<string>(); XmlNode node = el; while (node is XmlElement) { pathList.Add(node.Name); node = node.ParentNode; } pathList.Reverse(); string[] nodeNames = pathList.ToArray(); return String.Join("/", nodeNames); }

0

Art Apr 28 '18 at 14:55

source share

Jon Skeet · Accepted Answer · 2008-10-27 20:35

Well, I could not help it. This will only work on attributes and elements, but hey ... what can you expect in 15 minutes :) Likewise, there might be a cleaner way to do it.

It is incorrect to include an index for each element (especially the root!), But this is easier than trying to figure out if there is any ambiguity otherwise.

using System; using System.Text; using System.Xml; class Test { static void Main() { string xml = @" <root> <foo /> <foo> <bar attr='value'/> <bar other='va' /> </foo> <foo><bar /></foo> </root>"; XmlDocument doc = new XmlDocument(); doc.LoadXml(xml); XmlNode node = doc.SelectSingleNode("//@attr"); Console.WriteLine(FindXPath(node)); Console.WriteLine(doc.SelectSingleNode(FindXPath(node)) == node); } static string FindXPath(XmlNode node) { StringBuilder builder = new StringBuilder(); while (node != null) { switch (node.NodeType) { case XmlNodeType.Attribute: builder.Insert(0, "/@" + node.Name); node = ((XmlAttribute) node).OwnerElement; break; case XmlNodeType.Element: int index = FindElementIndex((XmlElement) node); builder.Insert(0, "/" + node.Name + "[" + index + "]"); node = node.ParentNode; break; case XmlNodeType.Document: return builder.ToString(); default: throw new ArgumentException("Only elements and attributes are supported"); } } throw new ArgumentException("Node was not in a document"); } static int FindElementIndex(XmlElement element) { XmlNode parentNode = element.ParentNode; if (parentNode is XmlDocument) { return 1; } XmlElement parent = (XmlElement) parentNode; int index = 1; foreach (XmlNode candidate in parent.ChildNodes) { if (candidate is XmlElement && candidate.Name == element.Name) { if (candidate == element) { return index; } index++; } } throw new ArgumentException("Couldn't find element within parent"); } }

How to get xpath from an XmlNode instance

More articles: