The function is designed for a large list (LINQ query in C #)

I am using the following query

var queryList1Only = (from file in list1 select file).Except(list2, myFileCompare); 

while myFileCompare matches 2 files based on name and length.

The query returned results if list1 and list2 were small (say 100 files during testing), then I increased list1 to 30,000 files and list2 to 20,000 files, and now the query says "Function Evaluation Timed Out" .

I searched the Internet and found that debugging could trigger it, so I deleted all the breakpoints and ran the code, now the program just froze, without output for queryList1Only I'm trying to print to check it.

EDIT: This is the code for myFileCompare

 class FileCompare : System.Collections.Generic.IEqualityComparer<System.IO.FileInfo> { public FileCompare() { } public bool Equals(System.IO.FileInfo f1, System.IO.FileInfo f2) { return (f1.Name == f2.Name && f1.Directory.Name == f2.Directory.Name && f1.Length == f2.Length); } // Return a hash that reflects the comparison criteria. According to the // rules for IEqualityComparer<T>, if Equals is true, then the hash codes must // also be equal. Because equality as defined here is a simple value equality, not // reference identity, it is possible that two or more objects will produce the same // hash code. public int GetHashCode(System.IO.FileInfo fi) { string s = String.Format("{0}{1}", fi.Name, fi.Length); return s.GetHashCode(); } } 
+6
source share
3 answers

What do you need to do with the items returned by the request? Basically, such heavy operations would be great for running in a separate thread at the same time, in order to avoid the situations you are facing.

EDIT: idea

As an example, you can try the following algorithm:

  • Sorting elements in both arrays using QuickSort ( List<T>.Sort() uses it by default ), it will be pretty fast with a good implementation of GetHashCode()
  • Then in the well-known for() loop trace list and compare items with the same index
  • When the number of any array reaches the maximum index of another list - select all the elements from the last list as different (basically they do not exist in the previous list at all).

I believe that with sorted arrays you will get much better performance. I believe the complexity of Except () is O (m * n) .

EDIT: another idea should be really fast

  • From one server store to Set<T>
  • Then scroll through the second array and search in Set<T> , it will be VERY fast! Basically O (mlogm) + O (n) , because you only have to go through one array and search in a set with a good hash function (use GetHashCode() , I provided updated logic) very quickly. Give it a try!
 // some kind of C# pseudocode ;) public IEnumerable<FileInfo> GetDifference() { ISet<FileInfo> firstServerFilesMap = new HashSet<FileInfo>(); // adding items to set firstServerFilesMap.Add(); List<FileInfo> secondServerFiles = new List<FileInfo>(); // adding items to list firstServerFilesMap.Add(); foreach (var secondServerFile in secondServerFiles) { if (!firstServerFilesMap.Contains(secondServerFile)) { yield return secondServerFile; } } } 

EDIT: For more information on equality logic, see comments.

Try this implantation

 public bool Equals(System.IO.FileInfo f1, System.IO.FileInfo f2) { if ( f1 == null || f2 == null) { return false; } return (f1.Name == f2.Name && f1.Directory.Name == f2.Directory.Name && f1.Length == f2.Length); } public int GetHashCode(System.IO.FileInfo fi) { unchecked { int hash = 17; hash = hash * 23 + fi.Name.GetHashCode(); hash = hash * 23 + fi.Directory.Name.GetHashCode(); hash = hash * 23 + fi.Length.GetHashCode(); return hash; } } 

Useful links:

+3
source

I have not tried this myself, but here is the idea: Add list1 as a HashSet, as follows:

 HashSet<FileInfo> List1 = new HashSet<FileInfo>(myFileCompare); 

Add all files:

 foreach(var file in files) { List1.Add(file); } 

Then remove the items:

 List1.ExceptWith(list2); 

Then we list:

 foreach(var file in List1) { //do something } 

I think this is faster, but as I said, I have not tried. Here is a link with general information about HashSet.

Edit: Or even better, you can initialize and add data in one step:

 HashSet<FileInfo> List1 = new HashSet<FileInfo>(files, myFileCompare); 
+1
source

I would recommend removing the length from the hash code and just doing fi.FullName. This still contains the uniqueness directive, although in some cases when you consider that length is required to distinguish, there may be hash collisions. But this is probably preferable to a longer execution of β€œExcept”. Similarly, change the comparison of equality with name and directory to full name, which is likely to be more efficient.

-1
source

Source: https://habr.com/ru/post/895269/


All Articles