Fastest way to parse XML files in C #?

I need to download a lot of XML files from the Internet. But for testing with the best speed, I downloaded all of them (more than 500 files) in the following format.

<player-profile>
  <personal-information>
    <id>36</id>
    <fullname>Adam Gilchrist</fullname>
    <majorteam>Australia</majorteam>
    <nickname>Gilchrist</nickname>
    <shortName>A Gilchrist</shortName>
    <dateofbirth>Nov 14, 1971</dateofbirth>
    <battingstyle>Left-hand bat</battingstyle>
    <bowlingstyle>Right-arm offbreak</bowlingstyle>
    <role>Wicket-Keeper</role>
    <teams-played-for>Western Australia, New South Wales, ICC World XI, Deccan Chargers, Australia</teams-played-for>
    <iplteam>Deccan Chargers</iplteam>
  </personal-information>
  <batting-statistics>
    <odi-stats>
      <matchtype>ODI</matchtype>
      <matches>287</matches>
      <innings>279</innings>
      <notouts>11</notouts>
      <runsscored>9619</runsscored>
      <highestscore>172</highestscore>
      <ballstaken>9922</ballstaken>
      <sixes>149</sixes>
      <fours>1000+</fours>
      <ducks>0</ducks>
      <fifties>55</fifties>
      <catches>417</catches>
      <stumpings>55</stumpings>
      <hundreds>16</hundreds>
      <strikerate>96.95</strikerate>
      <average>35.89</average>
    </odi-stats>
    <test-stats>
      .
      .
      .
    </test-stats>
    <t20-stats>
      .
      .
      .    
    </t20-stats>
    <ipl-stats>
      .
      .
      . 
    </ipl-stats>
  </batting-statistics>
  <bowling-statistics>
    <odi-stats>
      <matchtype>ODI</matchtype>
      <matches>378</matches>
      <ballsbowled>58</ballsbowled>
      <runsgiven>64</runsgiven>
      <wickets>3</wickets>
      <fourwicket>0</fourwicket>
      <fivewicket>0</fivewicket>
      <strikerate>19.33</strikerate>
      <economyrate>6.62</economyrate>
      <average>21.33</average>
    </odi-stats>
    <test-stats>
      .
      .
      . 
    </test-stats>
    <t20-stats>
      .
      .
      . 
    </t20-stats>
    <ipl-stats>
      .
      .
      . 
    </ipl-stats>
  </bowling-statistics>
</player-profile>

I use

XmlNodeList list = _document.SelectNodes("/player-profile/batting-statistics/odi-stats");

And then connect this list with foreachhow

foreach (XmlNode stats in list)
  {
     _btMatchType = GetInnerString(stats, "matchtype"); //it returns null string if node not availible
     .
     .
     .
     .
     _btAvg = Convert.ToDouble(stats["average"].InnerText);
  }

Even I download all files offline, parsing is very slow. Is there a good quick way to parse them? Or is this a problem with SQL? I save all the extracted data from XML to the database using DataSets, InsertAdters TableAdapters.

EDIT: Now for using XmlReader, please provide the XmlReader code for the above document. at the moment I did it

void Load(string url) 
{
    _reader = XmlReader.Create(url); 
    while (_reader.Read()) 
    { 
    } 
} 

XmlReader . , , - , , odi, t2o, ipl .. .

+3
8

, , XML. , .

- , . , . - :

Dictionary<string, string> map = new Dictionary<string, string>
{
  { "matchtype", null },
  { "matches", null },
  { "ballsbowled", null }
};

foreach (XmlElement elm in stats.SelectNodes("*"))
{
   if (map.ContainsKey(elm.Name))
   {
      map[elm.Name] = elm.InnerText;
   }
}

, , , . null, , ( ).

, DataTable, DataTable XML, , DataTable.Columns - . , DataColumn , , :

foreach (XmlElement elm in stats.SelectNodes("*"))
{
   if (myTable.Columns.Contains(elm.Name))
   {
      DataColumn c = myTable.Columns[elm.Name];
      if (c.DataType == typeof(string))
      {          
         myRow[elm.Name] = elm.InnerText;
         continue;
      }
      if (c.DataType == typeof(double))
      {
         myRow[elm.Name] = Convert.ToDouble(elm.InnerText);
         continue;
      }
      throw new InvalidOperationException("I didn't implement conversion logic for " + c.DataType.ToString() + ".");
   }
}

, - , , , , , .

Edit

, -, . Python; # , , - .

, , , DataColumn, , . , , :

Dictionary<string, Type> typeMap = new Dictionary<string, Type>
{
   { "matchtype", typeof(string) },
   { "matches", typeof(int) },
   { "ballsbowled", typeof(int) }
}

, :

if (typeMap[elm.Name] == typeof(int))
{
   result[elm.Name] = Convert.ToInt32(elm.Text);
   continue;
}

Dictionary<string, string>, , ; Dictionary<string, object>.

; , continue, - , . ? , :

Dictionary<Type, Func<string, object>> conversionMap = 
   new Dictionary<Type, Func<string, object>>
{
   { typeof(string), (x => x) },
   { typeof(int), (x => Convert.ToInt32(x)) },
   { typeof(double), (x => Convert.ToDouble(x)) },
   { typeof(DateTime), (x => Convert.ToDateTime(x) }
};

, -. Func<string, object> , string . , : -, . (x), . ( , x - ? Func<string, object> .)

, :

result[elm.Name] = conversionMap[typeMap[elm.Name]](elm.Text);

: typeMap, conversionMap , elm.Text .

. . , . Code Complete, , . . , . .

+6
+9
+2

, ( ) , XmlDocument, - . XmlReader.

0

, LINQ - . Google, HTML Agility Pack.

, , . , , XML-. , , , /, , .

0

, XML , XML . , .

( , ) .

0

XmlReader - . XmlDocument , Xml, . , Xmls 50 (10 ) XmlDocument.

0

DataSet , DataSet.ReadXML() - , .

This toy application does this, and it works with the format you defined above.

Project file: http://www.dot-dash-dot.com/files/wtfxml.zip Installer: http://www.dot-dash-dot.com/files/WTFXMLSetup_1_8_0.msi

It allows you to view the editing of your XML file using the tree and grid format — the tables listed in the table are those that are automatically created by the DataSet after ReadXML ().

0
source

Source: https://habr.com/ru/post/1750142/


All Articles