C # Define duplicate in list

Requirement: in an unsorted list, determine if a duplicate exists. In a typical way, I will make it the n-square of a nested loop. I wonder how others solve this. Is there an elegant, high-performance method in Linq? Something common that a lambda or comparator accepts would be nice.

+45
list generics c # algorithm linq
Feb 22 '11 at 15:59
source share
9 answers

If I'm not missing something, you can get away with something simple using Distinct() . Of course, this will not be the hardest implementation you could come up with, but it will tell you if duplicates have been removed:

 var list = new List<string>(); // Fill the list if(list.Count != list.Distinct().Count()) { // Duplicates exist } 
+94
Feb 22 '11 at 16:01
source share

According to Eric White's article on Finding Duplicates Using LINQ :

An easy way to find duplicates is to record a query, which is grouped by identifier, and then filter for groups that have more than one member. In the following example, we want to know that 4 and 3 are duplicates:

 int[] listOfItems = new[] { 4, 2, 3, 1, 6, 4, 3 }; var duplicates = listOfItems .GroupBy(i => i) .Where(g => g.Count() > 1) .Select(g => g.Key); foreach (var d in duplicates) Console.WriteLine(d); // 4,3 
+28
Feb 22 '11 at 16:03
source share

Put all the elements in the set, and if the set counter is different from the number of lists, then there is a duplicate.

 bool hasDuplicates<T>(List<T> myList) { var hs = new HashSet<T>(); for (var i = 0; i < myList.Count; ++i) { if (!hs.Add(myList[i])) return true; } return false; } 

Should be more effective than Distinct, as there is no need to go through the entire list.

+13
Feb 22 '11 at 16:04
source share

To ensure a short circuit, if a duplicate exists at the top of the list, you can add a HashSet<T> and check the return value of its .Add method,

Using .Any , you can short encode the listing as soon as you find the duplicate.

Here's the LINQ extension method in both C # and VB:

Csharp:

 public static bool ContainsDuplicates<T>(this IEnumerable<T> enumerable) { var knownKeys = new HashSet<T>(); return enumerable.Any(item => !knownKeys.Add(item)); } 

Visual Basic:

 <Extension> Public Function ContainsDuplicates(Of T)(ByVal enumerable As IEnumerable(Of T)) As Boolean Dim knownKeys As New HashSet(Of T) Return enumerable.Any(Function(item) Not knownKeys.Add(item)) End Function 

Note : to check for duplicates, just change Any to All

+12
May 22 '14 at 4:24
source share

Something in these lines is relatively simple and will give you the number of duplicates.

 var something = new List<string>() { "One", "One", "Two", "Three" }; var dictionary = new Dictionary<string, int>(); something.ForEach(s => { if (dictionary.ContainsKey(s)) { dictionary[s]++; } else { dictionary[s] = 1; } }); 

I assume this is similar to the Distinct implementation, although I'm not sure.

+2
Feb 22 '11 at 16:08
source share

You can use the Distinct () extension method for IEnumerable

+1
Feb 22 '11 at 16:04
source share

If you use integers or ordered sets, use the binary tree for O (nlog n) performance.

Alternatively, find another quicker sort tool, and then just check that each value is different from the previous one.

+1
Feb 22 '11 at 19:27
source share

Use Enumerable.Any with HashSet.Add like:

 List<string> list = new List<string> {"A", "A", "B", "C", "D"}; HashSet<string> hashSet = new HashSet<string>(); if(list.Any(r => !hashSet.Add(r))) { //duplicate exists. } 

HashSet.Add will return false if the item already exists in the HashSet . This will not iterate over the entire list.

+1
Sep 16 '14 at 19:06
source share

You can use the IEnumerable.GroupBy method.

 var list = new List<string> {"1", "2","3", "1", "2"}; var hasDuplicates = list.GroupBy(x => x).Any(x => x.Skip(1).Any()); 
+1
Jun 19 '15 at 4:59
source share



All Articles