At first glance, the approach you took looks great - it should work fine, and there is nothing obvious that can cause, for example, a lot of garbage.
The main thing is that I think that you will use only one of these sixteen cores: there is nothing else to divide the load into the other fifteen.
I think the easiest way to do this is to split a large 20 GB file into sixteen pieces, then parse each of the pieces together, and then combine the pieces again. The extra time spent splitting and reassembling the file should be minimal compared to the 16x gain involved in scanning these sixteen pieces together.
In general terms, one way to do this could be:
private List<string> SplitFileIntoChunks(string baseFile) {
One tiny nit: move the calculation of the stream length from the loop, you only need to get it.
EDIT: also incorporate the idea of ββ@Pavel Gatilov to invert the logic of the inner loop and search for every word in a string in a list of 12 million.
source share