How do you filter a list so that none of them are a substring of another member

I have a list containing string elements, but some lines contain similar text, I'm trying to get a separate list.

My list contains the following:

-Customers\\Order1 -Customers\\Order1\\Product1 -Customers\\Order2\\Product1 -Customers\\Order2\\Product1\\Price 

From this list I need to get:

 -Customers\\Order1\\Product1 -Customers\\Order2\\Product1\\Price 

Basically, I want to omit the line if it is on another line in the list?

+6
source share
5 answers

You can do this with the LINQ bit and foreach , for example:

 List<string> outputList = new List<string>(); foreach (var str in originalList) { if (!outputList.Contains(str) && !originalList.Any(r => r!= str && r.Contains(str))) { outputList.Add(str); } } 

Given that your originalList is defined as:

 List<string> originalList = new List<string> { "Customers\\Order1", "Customers\\Order1\\Product1", "Customers\\Order2\\Product1", "Customers\\Order2\\Product1\\Price", }; 

You will get outputList as:

 Customers\\Order1\\Product1 Customers\\Order2\\Product1\\Price 
+6
source

If these values ​​are true paths and you want to process subdirectories, you need to make sure that you also handle the case where the name is a substring of another name, but these are different paths. I.E. Customer\\Order1 and Customer\\Order10 .

 public static class Extensions { public static IEnumerable<string> DistinctBySubString(this IEnumerable<string> strings) { var results = new List<string>(); foreach (var s in strings) { bool add = true; for(int i=results.Count-1; i>=0; i--) { if (IsSubDirectoryOf(results[i],s)) { results.RemoveAt(i); } else if (IsSubDirectoryOf(s,results[i])) { add = false; } } if (add) results.Add(s); } return results; } private static bool IsSubDirectoryOf(string dir1, string dir2) { DirectoryInfo di1 = new DirectoryInfo(dir1); DirectoryInfo di2 = new DirectoryInfo(dir2); bool isParent = false; while (di2.Parent != null) { if (di2.Parent.FullName == di1.FullName) { isParent = true; break; } else di2 = di2.Parent; } return isParent; } } 

Using it as follows:

 List<string> strings = new List<string>() { "Customers\\Order1", "Customers\\Order10", "Customers\\Order1\\Product1", "Customers\\Order2\\Product1", "Customers\\Order2\\Product1\\Price" }; foreach (var result in strings.DistinctBySubString()) { Console.WriteLine(result); } 

Directory matching is based on the code from this answer: Given the full path, check if the path is a subdirectory of any other path or otherwise

+5
source

The problem with the latter procedure is that you have to abort the search in the second list when there is a match. otherwise, it will continue to be valid for other elements.

EDIT: new procedure:

 class Program { private static IEnumerable<string> SelectUnique(IEnumerable<string> list) { // iterate the list.. foreach (var item1 in list) // you don't want to match the same item. if (!list.Where(item2 => item1 != item2) // search for items where it start with the current item. (notice the ! before the list.Where) .Any(item2 => item2.StartsWith(item1))) yield return item1; } static void Main(string[] args) { List<string> list = new List<string>(); list.Add("Customers\\Order1\\Product1"); list.Add("Customers\\Order2\\Product1"); list.Add("Customers\\Order2\\Product1\\Price"); list.Add("Customers\\Order1"); list.Add("Customers\\Order3\\Price"); var results = SelectUnique(list); foreach (var item in results) Console.WriteLine(item); Console.ReadKey(); } } 
+3
source

I think this is best done as a LINQ query.

 var input = new List<string>() { "Customers\\Order1", "Customers\\Order1\\Product1", "Customers\\Order2\\Product1", "Customers\\Order2\\Product1\\Price", }; var query = from x in input where !input.Any(y => y != x && y.Contains(x)) select x; var result = query.ToList(); 

What I get from:

result


Just in case, the actual requirement is to search by subpath and not by substring, then this works:

 var input = new List<string>() { "Customers\\Order1", "Customers\\Order1\\Product10", "Customers\\Order1\\Product1", "Customers\\Order2\\Product1", "Customers\\Order2\\Product1\\Price", }; var paths = input.ToDictionary(x => x, x => x.Split('\\')); var query = from x in input where !input .Any(y => y.Length > x.Length && paths[x] .Zip(paths[y], (p1, p2) => new { p1, p2 }) .All(p => p.p1 == p.p2)) select x; var result = query.ToList(); 

I get this result:

result2

+3
source

If the order of the elements does not matter, then it is a matter of sorting your list from the longest to the shortest, and then providing a custom equality mapping to the Distinct LINQ method.

The comparator implements both GetHashCode and Equals . Since Equals will not be called if the hash codes are not equal, you can just take it out, always returning 0 . The rules for GetHashCode indicate that things that are not equal can return the same hash code so that you don't violate the semantics here.

Then the Equals method is simply compared to see if the old line starts with a new line. The new line is passed as the first argument, and the old line is passed as the second argument.

Then our comparator looks like this:

 public class StartsWithEqualityComparer : IEqualityComparer<String> { #region IEqualityComparer implementation public bool Equals (string x, string y) { return y.StartsWith (x); } public int GetHashCode (string obj) { return 0; } #endregion } 

Then you can make the call using it using the Distinct method

 var foo = list.OrderByDescending(s=> s.Count()) .Distinct (new StartsWithEqualityComparer ()) .ToList(); 

Finally, if necessary, you can use the Sort method to reorder the list in the desired order (eq in alphabetical order).

+2
source

Source: https://habr.com/ru/post/982397/


All Articles