I would suggest storing multiple indexes in memory.
Create a file that indexes all files by file length:
Dictionary<int, List<FileInfo>> IndexBySize;
When you process a new Fu file, it quickly searches for all other files with the same size.
Create another one that indexes all files by the modification timestamp:
Dictionary<DateTime, List<FileInfo>> IndexByModification;
Given Fu file, you can find all files modified at the same time.
Repeat for each characteristic signficiant. You can then use the Intersect() extension method to efficiently compare multiple criteria.
For instance:
var matchingFiles = IndexBySize[fu.Size].Intersect(IndexByModification[fu.Modified]);
This will allow you to avoid phased scanning until you need it. Then, for the files that hashed, create another index:
Dictionary<MD5Hash, List<FileInfo>> IndexByHash;
You might want to compute multiple hashes at the same time to reduce the number of conflicts.
Bevan source share