Why is this XML file loading slowly?

I have a very simple code:

XmlDocument doc = new XmlDocument(); Console.WriteLine("loading"); doc.Load(url); Console.WriteLine("loaded"); XmlNodeList nodeList = doc.GetElementsByTagName("p"); foreach(XmlNode node in nodeList) { Console.WriteLine(node.ChildNodes[0].Value); } return source; 

I am working on this file and the download takes two minutes. Why so long? I tried both with extraction and with a file from the network and downloading a local file.

+4
source share
2 answers

I imagine a DTD page that takes so long to load. Given that it defines entities, you should not disable it , so you should probably not go this route.

Given the internal workings of the wikipedia analyzer (the right mess), I would say that this is a big leap, suggesting that it is going to create well-formed XHTML every time.

Use the HTML Agility Pack for parsing (then you can convert to XmlDocument little easier if necessary, IIRC).

If you really want to go down the XmlDocument route, you can save the local HTML-DTD cache. See this post , this post and this post for details.

+9
source

This is because the XmlDocument does not just load your Xml into the class hierarchy, it also goes and selects all the DTD namespaces defined in the document. Run fiddler and you will see the calls to retrieve

 http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent 

It took me about 20 seconds.

+5
source

Source: https://habr.com/ru/post/1348016/


All Articles