Executing HashSet <T> and Linq Queries

Question

Executing HashSet <T> and Linq Queries

Last week I got some code and asked for better performance. It began with work, but soon I saw that they use many HashSet<T> objects to store large collections of objects (from 10,000 to more than 100,000 objects). In the code, they use HashSet<T> for performance reasons.

The only thing they do is populate the HashSet with objects, and then give us a little Linq to execute queries between multiple collections. Most queries combine 1 or n HashSet or retrieve specific objects from the collection using First() or Where() .

I am wondering if there was any performance advantage over the normal List<T> ? Since all the Linq extension methods that they use in the code are written for IEnumerable<T> .

Many articles on the Internet say that List will be faster, but some say that HashSet handles huge collections much better than List.

Hope someone can give me more advice.

Thanks.

+4

performance c # hashset linq-to-objects

Chouffie Nov 13 '11 at 7:12

source share

1 answer

codekaizen · Accepted Answer · 2011-11-13T07:22:25+0000

If you use only LINQ queries, you do not get any benefits, as you simply list the entire collection. In fact, it may be that List<T> is the best performance due to continuous internal storage.

To get the most out of HashSet<T> , you need to use the ISet<T> methods, ideally with another HashSet<T> , because looking at the code, it is optimized for this case. In addition, operations will only be faster to use the hash codes of member objects, for example, equality testing, since the performance of the HashSet<T> based on the O (1) performance characteristic for hash searches. Operations that do not use member hash codes, such as filtering by member property and members themselves, must be an O (N) operation, which makes it the same as List<T> .

Executing HashSet <T> and Linq Queries

More articles: