Convert IEnumerable to a dictionary for performance?

I recently saw a new trend in my firm where we are changing IEnumerable to a dictionary with a simple LINQ transform as follows:

enumerable.ToDictionary(x=>x); 

We basically do this when the operation in the collection is Contains / Access, and obviously the dictionary has better performance in such cases.

But I understand that converting Enumerable to a dictionary has its own costs, and I wonder at what point it starts to break even (if it does), i.e. IEnumerable performance Contains / Access equal to ToDictionary + access / contains.

Well, I would add that there is no access to the database that can be called enumerated from the database query, and thats it and the enumerated can be edited after that too.

It would also be interesting to know how a data type affects performance?

The search can be 2-5 times overall, but sometimes it can be one. But I saw things like For enumerated:

  var element=Enumerable.SingleorDefault(x=>x.Id); //do something if element is null or return 

for dictionary:

  if(dictionary.ContainsKey(x)) //do something if false else return 

This has been provoking me for quite some time.

+6
source share
5 answers

Dictionary performance versus IEnumerable

A Dictionary , when used correctly, is always faster to read (unless the data set is very small, for example, 10 items). When creating, there may be overhead.

Given m as the number of requests executed against the same object (they are approximate):

  • IEnumerable Performance (created from a clean list): O (mn)
    • This is because you need to look at all the elements every time (essentially m * O(n) ).
  • Dictionary Performance: O(n) + O(1m) or O(m + n)
    • This is because you need to insert elements ( O(n) ) first.

In general, it can be seen that Dictionary wins when m > 1 , and IEnumerable wins when m = 1 or m = 0 .

In general, you should:

  • Use Dictionary when performing searches more than once with the same dataset.
  • Use IEnumerable when doing a search.
  • Use IEnumerable when the data set may be too large to be inserted into memory.
    • Remember that an SQL table can be used as a Dictionary , so you can use it to compensate for memory pressure.

Additional considerations

Dictionary use GetHashCode() to organize your internal state. Dictionary performance is strongly related to hash code in two ways.

  • Poor execution of GetHashCode() - leads to overhead every time an item is added, searched or deleted.
  • Poor hash codes are the result in a dictionary that does not have O(1) search performance.

Most built-in .Net types (especially value types) have very good hashing algorithms. However, with list types (like a string), GetHashCode() has O(n) performance - because it needs to iterate over the entire string. Thus, the performance of the dictionary can indeed be seen (where m is the big one for efficient GetHashCode() ): O(1) + M

+7
source

It depends....

How long is IEnumerable?

Does accessing a database cause IEnumerable?

How often is it available?

It would be best to experiment and profile.

+2
source

If you often find elements in your collection for some keywords, then the dictionary will definitely be faster, because either hash collecting and searching is faster at times, otherwise, if you do not search a lot through the collection, conversion is not necessary, because time is conversions can be more than one or two searches in a collection,

+1
source

IMHO: you need to measure this in your environment with representative data. In such cases, I just write a quick console application that measures the execution time of the code. To have a better measure, you need to execute the same code several times, I think.

ADD:

It also depends on the application you are developing. Usually you get more opportunities to optimize other places (avoiding network conflicts, caching, etc.) during this time and effort.

0
source

I will add that you did not tell us what happens every time you rewind your IEnumerable<> . Is this direct support for data collection? (for example, a List<> ) or calculated on the fly? If it is the first for small collections, then listing them to find the element you are looking for is faster (a dictionary for 3/4 elements is useless. If you want, I can build some benchmark to find the breakpoint). If this is the second, then you need to consider whether "caching" IEnumerable<> in the collection is a good idea. If so, you can choose between List<> or Dictionary<> , and we will return to step 1. Is IEnumerable small or large? And there is a third problem: if the collection is not supported, but it is too large for memory, then obviously you cannot put it in Dictionary<> . Then maybe it's time to get SQL to work for you :-)

I will add that โ€œcrashesโ€ have their costs: in List<> , if you try to find an item that does not exist, the cost is O(n) , and in Dictionary<> cost is still O(1) .

0
source

Source: https://habr.com/ru/post/896769/


All Articles