Find duplicate in list but with criteria

Question

Find duplicate in list but with criteria

Want to remove duplicates from the list, so if my list has:

www.test.com test.com mytest.com

I want the final list to look below (only for choosing domains from www from the duplicate in front):

 www.test.com mytest.com

I have this linq, but it seems to ignore all domains that do not have www, because it only selects www:

 var result=inputList.Where(x=>x.DomainName.StartsWith("www.")).Distinct();

Edit:

@DanielHilgarth: I just run your code and it does not give the correct results. I have:

 test.com www.test.com test2.com www.test2.com test3.com www.test3.com test4.com

on my list. He returns this:

 test.com www.test.com www.test2.com www.test3.com

and here is how I use your code:

 var result = lstServerBindings.GroupBy(x => x.DomainName.StartsWith("www.") ? x.DomainName : "www." + x) .Select(x => { var domain = x.FirstOrDefault(y => y.DomainName.StartsWith("www.")); if (domain == null) return x.First(); return domain; });

And then I do a foreach loop to assign to the new list:

 foreach (var item in result) { lstUniqueServerBindings.Add(new ServerBindings { IPAddress = item.IPAddress, PortNumber = item.PortNumber, DomainName = item.DomainName }); }

+4

list c # linq

Zaki Jan 12 '12 at 10:15

source share

4 answers

It was difficult, but there was a fairly simple solution:

  public class wwwOrderComparison : IComparer<String> { public int Compare(string x, string y) { if(x == null && y == null) return 0; if(x == null ^ y == null) return 0; var xWww = x.StartsWith("www"); var yWww = y.StartsWith("www"); return (xWww && x == "www." + y) ? -1 : ((yWww && "www." + x == y) ? 1 : 0); } } public class wwwEqualityComparison : IEqualityComparer<String> { public bool Equals(string x, string y) { if (x == null && y == null) return true; if (x == null ^ y == null) return false; var xWww = x.StartsWith("www"); var yWww = y.StartsWith("www"); if (xWww ^ yWww) return xWww ? (x == "www." + y) : ("www." + x == y); return xWww == yWww; } public int GetHashCode(string obj) { return (obj.StartsWith("www.") ? obj : ("www." + obj)).GetHashCode(); } }

Here's the test:

  var list = new List<String> { "www.test.com", "test.com", "mytest.com", "abc.com", "www.abc.com", "zzz.com", "www.zzz.com" }; var s = list.OrderBy(t => t, new wwwOrderComparison()).Distinct(new wwwEqualityComparison()).ToList();

It passed all my tests. Greetings a second time :)

+1

Skyrim Jan 12 '12 at 10:41

source share

Edit: See Daniel's answer below. I was too hasty on this.

Use Select to project your elements by selecting / modifying certain properties. This may seem complicated, but all you have to do is:

 inputList.Select(x => x.Replace("www.", "")).Distinct()

Must work!

Edit: A little explanation. With select, you can basically map your old objects to new ones, and then select these objects for your query. Although in the above case you select a simple string object, you can create a completely new type of object with something like:

 Select(x => new { Content = x, ContentLength = x.Length, ContentType = x.GetType() })

Here you create a new object on the fly, based on the different properties and methods of your input objects. The choice is very useful and powerful!

0

Anders arpi Jan 12 '12 at 10:20

source share

The usual .NET way to override what equality means (which is essentially what you are doing here) is to implement IEqualityComparer<T> .

 private class IgnoreWWWEqComparer : IEqualityComparer<string> { public bool Equals(string x, string y) { if(ReferenceEquals(x, y)) return true; if(x == null || y == null) return false; if(x.StartsWith("www.")) { if(y.StartsWith("www.")) return x.Equals(y); return x.Substring(4).Equals(y); //the above line can be made faster, but this is a reasonable //approach if performance isn't critical } if(y.StartsWith("www.")) return x.Equals(y.Substring(4)); return x.Equals(y); } public int GetHashCode(string obj) { if(obj == null) return 0; if(obj.StartsWith("www.")) return obj.Substring(4).GetHashCode(); return obj.GetHashCode(); } }

Now Distinct() does what you want:

 var result=inputList.OrderBy(s => !s.StartsWith("www.")).Distinct(new IgnoreWWWEqComparer());

For one, you may find it more convenient for group by for a string with any removal of the initial www. and select the first from each group, but above it should be faster to discard the found duplicates and the IgnoreWWWEqComparer course can be reused.

Edit:

Given the requirement that "www." forms take precedence, that’s good, but it’s a little difficult for me to think that it would be bad if we had a really big list. We would like to improve our Equals and GetHashCode if we really intimidated performance, but maybe sorting a massive list would be fine for a few dozen, but after a while it would start to hurt. Therefore, the following approach is not the approach that I would take if there were only a small number (just go simpler), but if it were very large:

 public static IEnumerable<string> FavourWWWDistinct(IEnumerable<string> src) { Dictionary<string, bool> dict = new Dictionary<string, bool>(new IgnoreWWWEqComparer()); foreach(string str in src) { bool withWWW; if(dict.TryGetValue(str, out withWWW)) { if(withWWW) continue; if(str.StartsWith("www.")) { dict[str] = true; yield return str; } } else { if(dict[str] = str.StartsWith("www.")) yield return str; } } foreach(var kvp in dict) if(!kvp.Value) yield return kvp.Key; }

In this way, we submit these forms starting with "www." as soon as we see them, and only those that do not start with it should wait for the entire list to be processed.

0

Jon hanna Jan 12 '12 at 10:43

source share

Daniel Hilgarth · Accepted Answer · 2012-01-12T10:25:39+0000

I think you want to have something like this:

 var result = domains.GroupBy(x => x.StartsWith("www.") ? x : "www." + x) .Select(x => { var domain = x.FirstOrDefault(y => y.StartsWith("www.")); if(domain == null) return x.First(); return domain; });

I tested it with this input:

 var domains = new List<string> { "www.test.com", "test.com", "mytest.com", "abc.com", "www.abc.com" };

Result:

 www.test.com mytest.com www.abc.com

Your code should look like this (note the optional .DomainName at the end of the second line):

 var result = lstServerBindings.GroupBy(x => x.DomainName.StartsWith("www.") ? x.DomainName : "www." + x.DomainName) .Select(x => { var domain = x.FirstOrDefault(y => y.DomainName.StartsWith("www.")); if (domain == null) return x.First(); return domain; });

BTW: you can save the foreach loop by changing the code to this:

 var result = lstServerBindings.GroupBy(x => x.DomainName.StartsWith("www.") ? x.DomainName : "www." + x.DomainName) .Select(x => { var item = x.FirstOrDefault(y => y.DomainName.StartsWith("www.")); if (item == null) item = x.First(); return new ServerBindings { IPAddress = item.IPAddress, PortNumber = item.PortNumber, DomainName = item.DomainName }; });

Find duplicate in list but with criteria

More articles: