Xml document splitting algorithm

Question

Xml document splitting algorithm

I want to split an XML document into several XML documents with the specified node name (similar to string.Split (...).)

Example. I have the following xml document.

<root> <nodeA> Hello </nodeA> <nodeA> <nodeB> node b Text </nodeB> <nodeImage> image.jpg </nodeImage> </nodeA> <nodeA> node a text </nodeA> </root>

I want to split this XML document into 3 parts into 'nodeImage' and keep the original xml structure. (Note: a node named "nodeImage" can be anywhere)
1.xml to nodeImage
2.xml for nodeImage
3.xml after nodeImage

For an xml sample, the results should be:

XML document 1:

 <root> <nodeA> Hello </nodeA> <nodeA> <nodeB> node b Text </nodeB> </nodeA> </root>

XML Document 2:

 <root> <nodeA> <nodeImage> image.jpg </nodeImage> </nodeA> </root>

XML Document 3:

 <root> <nodeA> node a text </nodeA> </root>

Does anyone know if there is a good algorithm or existing code sample for this requirement?

Update Notes:
If there is only one node in the xml document named "nodeImage", this XML document should always be divided into three xml documents.

+4

c # algorithm xml

Alex cube Aug 13 '13 at 9:24

source share

5 answers

Giannis paraskevopoulos · Answer 1 · 2013-08-13T09:29:04+0000

 XElement xe = XElement.Load(XMLFile); foreach(XElement newXE in xe.Elements("nodeA")) { XElement root = new XElement("root",newXE); root.Save(newFile); }

Tormod · Answer 2 · 2013-08-13T11:46:28+0000

The term "split" is a bit confusing. Dividing into one occurrence usually does not lead to three parts.

First I will try to define your question in Linq for xml terms. For each case of XDocument.Descendants ("nodeImage") that you want to create:

A copy of the document in which the parent nodeImage has nodeImage and all subsequent remote nodes. In addition, all ancestors must remove all of the following.
A copy of the document in which all the ancestors of the nodeImage element have all XElement.NextNodes and XElement.PreviousNodes.
Repeat this check again on a copy of XDocument, where all Precestor predecessors have been removed.
If no events are found. The document being checked is returned in full.

A deep copy of XDocument is simple. It has a copy constructor. Of course, it will be a blank of memory if your xml has a significant size.

However, the challenge is to find your node in each copy. This question shows how you can get an XPath element. You can use this.

Roman pekar · Answer 3 · 2013-08-13T10:24:06+0000

something like this using System.Xml.Linq ?

 var doc = XDocument.Parse(stringxml); var res = new List<XElement>(); var cur = new XElement("root"); foreach (var node in doc.Element("root").Elements("nodeA")) { if (node.Element("nodeImage") == null) { cur.Add(node); } else { res.Add(cur); res.Add(new XElement("root", node)); cur = new XElement("root"); } } res.Add(cur);

Nayan · Answer 4 · 2013-08-13T10:25:24+0000

It works. Test it extensively.

 var doc = new XmlDocument(); doc.LoadXml(@"<root> <nodeA> Hello </nodeA> <nodeA> <nodeB> node b Text </nodeB> <nodeImage> image.jpg </nodeImage> </nodeA> <nodeA> node a text </nodeA></root>"); var xmlFrags = new List<string>(); string xml = "<root>"; bool bNewFragment = true; foreach (XmlNode nodeA in doc.SelectNodes("//root/nodeA")) { XmlNode nodeImage = nodeA.SelectSingleNode("nodeImage"); if (nodeImage != null) { xml += "<nodeA>"; var en = nodeA.GetEnumerator(); while (en.MoveNext()) { XmlNode xn = (XmlNode)en.Current; if (xn != nodeImage) xml += xn.OuterXml; } xml += "</nodeA></root>"; xmlFrags.Add(xml); xml = "<root><nodeA>" + nodeImage.OuterXml + "</nodeA></root>"; xmlFrags.Add(xml); bNewFragment = true; } else { if (bNewFragment) { xml = "<root>"; bNewFragment = false; } xml += nodeA.OuterXml; } } if (!bNewFragment) { xml += "</root>"; xmlFrags.Add(xml); } //Use the XML fragments as you like foreach (var xmlFrag in xmlFrags) Console.WriteLine(xmlFrag + Environment.NewLine);

Alex filipovici · Answer 5 · 2013-08-13T10:55:10+0000

Try the following:

 using System; using System.Xml; class Program { static void Main(string[] args) { // create the XML documents XmlDocument doc1 = new XmlDocument(), doc2 = new XmlDocument(), doc3 = new XmlDocument(); // load the initial XMl into doc1 doc1.Load("input.xml"); // create the structure of doc2 and doc3 doc2.AppendChild(doc2.ImportNode(doc1.FirstChild, false)); doc3.AppendChild(doc3.ImportNode(doc1.FirstChild, false)); doc2.AppendChild(doc2.ImportNode(doc1.DocumentElement, false)); doc3.AppendChild(doc3.ImportNode(doc1.DocumentElement, false)); // select the nodeImage var nodeImage = doc1.SelectSingleNode("//nodeImage"); if (nodeImage != null) { // append to doc3 var node3 = nodeImage.ParentNode.NextSibling; var n3 = doc3.ImportNode(node3, true); doc3.DocumentElement.AppendChild(n3); // append to doc2 var n2 = doc2.ImportNode(nodeImage.ParentNode, true); n2.RemoveChild(n2.SelectSingleNode("//nodeImage").PreviousSibling); doc2.DocumentElement.AppendChild(n2); // remove from doc1 nodeImage.ParentNode.ParentNode .RemoveChild(nodeImage.ParentNode.NextSibling); nodeImage.ParentNode .RemoveChild(nodeImage); } Console.WriteLine(doc1.InnerXml); Console.WriteLine(doc2.InnerXml); Console.WriteLine(doc3.InnerXml); } }

Xml document splitting algorithm

More articles: