DataTable.Select and performance in C #

I import data from three tab delimited files into DataTables, after which I need to go through each row of the main table and find all the rows in the two child tables. Against each DataRow [] array that I found from the child tables, I have to go through each row individually again and check the values ​​based on different parameters, and in the end I need to create the final record, which will be a merge of the master and two child columns of the table. Now I have done this and its work, but the problem is its performance. I use DataTable.Select to find all the child rows from the child table, which I believe make it very slow. Remember that none of the tables has a primary key, since duplicate rows are acceptable.Currently, I have 1200 rows in the main table and aroun 8000 rows in the child table, and the total time it takes is 8 minutes.

Any idea how to increase productivity. thanks in advance

The code is below ****************

 DataTable rawMasterdt = importMasterFile();
 DataTable rawDespdt = importDescriptionFile();

        dsHelper = new DataSetHelper();
        DataTable distinctdt = new DataTable();
        distinctdt = dsHelper.SelectDistinct("DistinctOffers", rawMasterdt, "C1");

        if (distinctdt.Rows.Count > 0)
        {
            int count = 0;
                foreach (DataRow offer in distinctdt.Rows)
                {
                    string exp = "C1 = " + "'" + offer[0].ToString() + "'" + "";
                    DataRow masterRow = rawMasterdt.Select(exp)[0];

                    count++;
                    txtBlock1.Text = "Importing Offer " + count.ToString() + " of " + distinctdt.Rows.Count.ToString(); 
                    if (masterRow != null )
                        {
                            Product newProduct = new Product();

                            newProduct.Code = masterRow["C4"].ToString();
                            newProduct.Name = masterRow["C5"].ToString();
                          //  -----
                            newProduct.Description = getProductDescription(offer[0].ToString(), rawDespdt);
                            newProduct.Weight = getProductWeight(offer[0].ToString(), rawDespdt);
                            newProduct.Price = getProductRetailPrice(offer[0].ToString(), rawDespdt);
                            newProduct.UnitPrice = getProductUnitPrice(offer[0].ToString(), rawDespdt);
                          //  ------- more functions similar to above here

                            productList.Add(newProduct);
                        }
                }
                txtBlock1.Text = "Import Completed";
 public string getProductDescription(string offercode, DataTable dsp)
    {
        string exp = "((C1 = " + "'" + offercode + "')" + " AND ( C6 = 'c' ))";
        DataRow[] dRows = dsp.Select( exp);
        string descrip = "";
        if (dRows.Length > 0)
        { 
            for (int i = 0; i < dRows.Length - 1; i++)
            {
              descrip = descrip + " " + dRows[i]["C12"];
            }
        }
        return descrip;

    }
+3
source share
5 answers

.Net 4.5, and the problem still exists.

Below are the results of a simple test in which DataTable.Select and various dictionary implementations are compared for processor time (results in milliseconds)

    #Rows Table.Select  Hashtable[] SortedList[] Dictionary[]
     1000        43,31         0,01         0,06         0,00
     6000       291,73         0,07         0,13         0,01
    11000       604,79         0,04         0,16         0,02
    16000       914,04         0,05         0,19         0,02
    21000      1279,67         0,05         0,19         0,02
    26000      1501,90         0,05         0,17         0,02
    31000      1738,31         0,07         0,20         0,03

Problem:

DataTable.Select "System.Data.Select" , "" (), . Select , , DataTable Select class, , DataTable.Select. ( System.Data)

:

,

DataRow[] rows = data.Select("COL1 = 'VAL1' AND (COL2 = 'VAL2' OR COL2 IS NULL)");

, , . ( , )

Dictionary<string, List<DataRow>> di = new Dictionary<string, List<DataRow>>();

foreach (DataRow dr in data.Rows)
{
    string key = (dr["COL1"] == DBNull.Value ? "<NULL>" : dr["COL1"]) + "//" + (dr["COL2"] == DBNull.Value ? "<NULL>" : dr["COL2"]);
    if (di.ContainsKey(key))
    {
        di[key].Add(dr);
    }
    else
    {
        di.Add(key, new List<DataRow>());
        di[key].Add(dr);
    }
}

( )

string key1 = "VAL1//VAL2";
string key2 = "VAL1//<NULL>";
List<DataRow>() results = new List<DataRow>();
if (di.ContainsKey(key1))
{
    results.AddRange(di[key1]);
}
if (di.ContainsKey(key2))
{
    results.AddRange(di[key2]);
}
+4

, . :

if (distinctdt.Rows.Count > 0)
{
    // build index of C1 values to speed inner loop
    Dictionary<string, DataRow> masterIndex = new Dictionary<string, DataRow>();
    foreach (DataRow row in rawMasterdt.Rows)
        masterIndex[row["C1"].ToString()] = row;

    int count = 0;
    foreach (DataRow offer in distinctdt.Rows)
    {

    string exp = "C1 = " + "'" + offer[0].ToString() + "'" + "";
    DataRow masterRow = rawMasterdt.Select(exp)[0];

DataRow masterRow;
if (masterIndex.ContainsKey(offer[0].ToString())
    masterRow = masterIndex[offer[0].ToString()];
else
    masterRow = null;
+3

DataRelation DataTables, , DataRow.GetChildRows(DataRelation) ( DataRow.GetChildRelName DataSets). TreeMap, .

, DataRelation, DataView.Sort/DataView.FindRows() DataTable.Select(), . DataView.FindRows() TreeMap (O (log (N)), , DataTable.Select() (O (N)). : http://arnosoftwaredev.blogspot.com/2011/02/when-datatableselect-is-slow-use.html

+1

DataTables DataSet. . http://msdn.microsoft.com/en-us/library/ay82azad%28VS.71%29.aspx . , , , , ( , ). , , , . , , , , ...

0

? . , :

  • Read the main text file in memory line by line. Put the master entry in the dictionary as the key. Add it to the data set (1 pass through the wizard).

  • Read the child text file line by line, add it as a value for the corresponding master record in the dictionary created above

  • Now you have everything in the dictionary in memory, only 1 pass through each file. Make a final pass through the dictionary / children and process each column and do the final calculations.

0
source

Source: https://habr.com/ru/post/1755712/


All Articles