In the application that I am writing, I have two potentially large data sets that I need to map to each other. One is a list returned from the web service, and the other is a DataTable. I need to take the ANSI (or ISO) number for each item in the list and find the DataTable line containing that ANSI number, and then do something with it.
Since DataTable.Select is rather slow, and I would have to do this for each item in the list, I experimented with faster alternatives. Keep in mind that there is no database for the DataTable. Therefore, I cannot use any SQL features or anything like that.
I thought the quickest way could be to create a dictionary with KeyValuePair (number A: Ansi or I: Iso) and use this as a key. The value will be the rest of the string. Creating this dictionary will obviously take a bit of processing time, but then I could use the extremely fast dictionary lookup time to find each row I need, and then add the rows back to the table. So in the foreach loop going for the list, I will only have O (1) complexity with a dictionary, not O (n) or any DataTable.Select type.
To my surprise, it turned out that the dictionary was incredibly slow. I could not understand why, until I found out that using a string (only ANSI number) instead of KeyValuePair dramatically increased performance. I speak hundreds of times faster. How on earth is this possible? This is how I test:
I am generating a list that mimics the output from a web service. I am creating a dictionary based on this list with a key (either a string, or KeyValuePair), or a DataRow value. I look at the foreach loop for this list and look at every item in this list in my dictionary, and then assign a value to the returned DataRow. What is it.
If I use KeyValuePair as the key to access the dictionary, it takes 1000 seconds, if I change the dictionary to take only the string as the key, it takes a millisecond for 10,000 elements. FYI: I developed a test so that there are always hits, so all the keys are always there.
Here is a block of code for which I am measuring time:
foreach(ProductList.Products item in pList.Output.Products)
{
DataRow row = dict[item.Ansi];
for (int i = 0; i < 10; i++)
{
row["Material"] = item.Material + "a";
}
hits++;
}
So, how on earth is it possible that runtime suddenly becomes hundreds of times longer if I use a dictionary (KeyValuePair, DataRow) instead of a dictionary (String, DataRow)?