The usual .NET way to override what equality means (which is essentially what you are doing here) is to implement IEqualityComparer<T> .
private class IgnoreWWWEqComparer : IEqualityComparer<string> { public bool Equals(string x, string y) { if(ReferenceEquals(x, y)) return true; if(x == null || y == null) return false; if(x.StartsWith("www.")) { if(y.StartsWith("www.")) return x.Equals(y); return x.Substring(4).Equals(y);
Now Distinct() does what you want:
var result=inputList.OrderBy(s => !s.StartsWith("www.")).Distinct(new IgnoreWWWEqComparer());
For one, you may find it more convenient for group by for a string with any removal of the initial www. and select the first from each group, but above it should be faster to discard the found duplicates and the IgnoreWWWEqComparer course can be reused.
Edit:
Given the requirement that "www." forms take precedence, thatโs good, but itโs a little difficult for me to think that it would be bad if we had a really big list. We would like to improve our Equals and GetHashCode if we really intimidated performance, but maybe sorting a massive list would be fine for a few dozen, but after a while it would start to hurt. Therefore, the following approach is not the approach that I would take if there were only a small number (just go simpler), but if it were very large:
public static IEnumerable<string> FavourWWWDistinct(IEnumerable<string> src) { Dictionary<string, bool> dict = new Dictionary<string, bool>(new IgnoreWWWEqComparer()); foreach(string str in src) { bool withWWW; if(dict.TryGetValue(str, out withWWW)) { if(withWWW) continue; if(str.StartsWith("www.")) { dict[str] = true; yield return str; } } else { if(dict[str] = str.StartsWith("www.")) yield return str; } } foreach(var kvp in dict) if(!kvp.Value) yield return kvp.Key; }
In this way, we submit these forms starting with "www." as soon as we see them, and only those that do not start with it should wait for the entire list to be processed.
source share