How to effectively compare two sorted large lists in C #?

Question

How to effectively compare two sorted large lists in C #?

I have two general lists with 20,000 and 30,000 objects in each list.

class Employee { string name; double salary; } List<Employee> newEmployeeList = List<Employee>() {....} // contains 20,000 objects List<Employee> oldEmployeeList = List<Employee>() {....} // contains 30,000 objects

Lists can also be sorted by name if they improve speed.

I want to compare these two lists to find out

employees whose name and salary match
employees whose name is the same but not salary

What is the fastest way to compare such large lists of data with the above conditions?

+6

c # .net

Santosh Jan 9 '12 at 22:14

source share

5 answers

Elalfer · Answer 1 · 2012-01-09T22:22:35+0000

I would sort the newEmployeeList and oldEmployeeList by name - O(n*log(n)) . And then you can use a linear algorithm to find matches. Thus, the total number will be O(n+n*log(n)) if both lists are the same size. This should be faster than the O(n^2) brute force algorithm.

James Michael Hare · Answer 2 · 2012-01-09T22:30:33+0000

I would recommend that the two lists are stored in Dictionary<string, Employee> based on the name to start with, then you can iterate over the keys in one and look if they exist, and the salaries match in the other. It will also save the cost of sorting them later or add them to a more efficient structure.

It is pretty much O (n) - linear to build both dictionaries, linear to go through the keys and search in the other. Since O (n + m + n) reduces to O (n)

But , if you should use List<T> to store lists for other reasons, you can also use the LINQ Join() method and create a new list with a Match field that tells you whether they were a match or a mismatch ...

  var results = newEmpList.Join( oldEmpList, n => n.Name, o => o.Name, (n, o) => new { Name = n.Name, Salary = n.Salary, Match = o.Salary == n.Salary });

You can then filter this out using the Where() clause for Match or !Match .

Scott Rippey · Answer 3 · 2012-01-09T22:50:45+0000

Refresh . I assume (by the title of your question) that 2 lists are already sorted. Perhaps they are stored in a database with a clustered index or something like that. Therefore, this answer is based on this assumption.

Here is an implementation that has O(n) complexity, and also very fast, and that is also pretty simple.
I believe this is a variant of the Merge Algorithm .

Here is an idea:

Start listing both lists.
Compare 2 current items.
If they match, add the results.
If the first item is less, advance the first list.
If the second item is less, advance the second list.

Since both lists are known to be sorted, this will work very well. This implementation assumes that name is unique in each list.

 var comparer = StringComparer.OrdinalIgnoreCase; var namesAndSalaries = new List<Tuple<Employee, Employee>>(); var namesOnly = new List<Tuple<Employee, Employee>>(); // Create 2 iterators; one for old, one for new: using (IEnumerator<Employee> A = oldEmployeeList.GetEnumerator()) { using (IEnumerator<Employee> B = newEmployeeList.GetEnumerator()) { // Start enumerating both: if (A.MoveNext() && B.MoveNext()) { while (true) { int compared = comparer.Compare(A.Current.name, B.Current.name); if (compared == 0) { // Names match if (A.Current.salary == B.Current.salary) { namesAndSalaries.Add(Tuple.Create(A.Current, B.Current)); } else { namesOnly.Add(Tuple.Create(A.Current, B.Current)); } if (!A.MoveNext() || !B.MoveNext()) break; } else if (compared == -1) { // Keep searching A if (!A.MoveNext()) break; } else { // Keep searching B if (!B.MoveNext()) break; } } } } }

Joachim Isaksson · Answer 4 · 2012-01-09T22:24:58+0000

You can create a dictionary using

 var lookupDictionary = list1.ToDictionary(x=>x.name);

This will allow you to get closer to finding O (1) and getting closer to O (n) behavior if you are looking at values from a loop over another list.

(I assume that ToDictionary is O (n), which makes sense with a direct implementation, but I did not check this in case)

This would make for a very straightforward algorithm, and I think that going below O (n) with two unsorted lists is quite difficult.

Tigran · Answer 5 · 2012-01-09T22:24:26+0000

One of the quickest solutions in sorted lists is to use BinarySearch to find an item in another list,

But, as others say, you should measure it according to your project requirements, as productivity often tends to be subjective.

How to effectively compare two sorted large lists in C #?

More articles: