XML takes the position of the element and the next time it goes right there

So, I have a huge XML file (wikipedia dump xml).

My school project states that I should be able to quickly search this XML file (so no, not import it into sql database)

therefore, of course, I want to create an index that will be displayed in a separate file (possibly xml) like this: [content to search]: [byte offset to the beginning of the xml node containing the content]

My question is: how can I take the position of an element and how can I go to this position in xml, if necessary for the search?

The project is in C #. Thank you in advance.

Later Edit: I'm trying to work with XmlReader, but I'm open to any other suggestions.

I'm currently reading my XML for non-indexed search

XmlReader reader = XmlReader.Create(FileName); while (reader.Read()) { switch (reader.Name) { case "page": Boolean found = false; String title = ""; String element = "<details>"; readMore(reader, "title"); title = reader.Value; if (title.Contains(word)) { found = true; } readMore(reader, "text"); String content = reader.Value; if (content.Contains(word) & !found) { found = true; } if (found) { element += "<summary>" + title + " (click)</summary>"; element += content; element += "</details>"; result.Add(element); } break; } } reader.Close(); if (result.Count == 0) { result.Add("No results were found"); } return result; โ€ฆ static void readMore(XmlReader reader, String name) { while (reader.Name != name) { reader.Read(); } reader.Read(); } 
+4
source share
1 answer

The correct solution would be to use an intermediate binary format; but if you cannot do this and think that you are using the DOM, I see no solution other than saving the node position in the DOM tree as a list of indexes.

An example in JavaScript (should be the same in C #):

 function getPosition(node) { var pos = [], i = 0; while (node != document.documentElement) { if (node.previousSibling) { ++i; node = node.previousSibling; } else { pos.unshift(i); i = 0; node = node.parentNode; } } return pos; } function getNode(pos) { var node = document.documentElement; for (var i = 0; i < pos.length; ++i) { node = node.childNodes[pos[i]]; } return node; } 
0
source

Source: https://habr.com/ru/post/1446399/


All Articles