We can only guess. Since you have enough free memory (and non-contiguous virtual address space), the problem is most likely due to the inability to allocate sufficient contiguous memory. Things that need the most contiguous memory are almost exclusively arrays, such as a support array for your queue. When everything works correctly, the address space is compacted regularly (part of the GC), and you maximize the available continuous memory. If this does not work, something is preventing the seal from working properly — for example, fixed handles, for example, used for I / O.
Why does explicit GC.Collect() help? It is possible that you are at the point where all these fixed handles are released and the seal really works. Try using something like VMMap or CLRProfiler to see how objects fit in the address space - a typical case of compaction problems - when you have something like 99% free space in your memory, but nowhere else is enough to allocate a new object (strings and arrays do not work very well with memory fragmentation). Another typical case is that when allocating unmanaged memory (e.g. for buffers) you neglect to use GC.AddMemoryPressure , so the GC has no idea that it should really start collecting already. Again, CLRProfiler is very useful when viewing when the GC happens, and how it maps to memory usage.
If memory fragmentation is really a problem, you need to find out why. This is actually a somewhat complicated process, and it may take some work with something like WinDbg, which is at least quite difficult to use. I / O always means several pinned buffers, so if you perform many I / O operations in parallel, you interfere with the proper functioning of the GC. The GC tries to handle this by creating a few heaps (depending on the exact GC configuration you work in, but looking at your case, the GC server really should be what you use - you use it on Windows Server, right? ), and I saw hundreds of heaps created to “fix” the fragmentation problem, but ultimately it's doomed to failure.
If you need to use fixed pens, you really want to select them once and, if possible, reuse them. Pinning does not allow the GC to do its job, so you only need to attach material that does not need to be moved to memory (heap objects of large objects, pre-allocated buffers at the bottom of the heap ...) or at least pin as short as possible.
In general, reusing buffers is a good idea. Unfortunately, this means that you want to avoid string and similar constructs in your code like this - string immutable means that every single line you read must be a separate object. Fortunately, you do not have to deal with string in your case - a simple byte[] buffer will work just as well - just find 0x13, 0x10 instead of "\r\n" . The main problem is that you need to store a lot of data in memory at the same time - you either need to minimize this, or make sure that the buffers are allocated where they are used best; for file data, the LOH buffer will be very useful.
One way to avoid such a large number of distributions is to analyze the file looking for the end of the line and remember only the offset of the line to which you want to start copying. When you go, one by one (using the reusable byte[] buffer), you simply update the offset "no more than the 100,000th line from the end", and do not select and free the lines. Of course, this means that you need to read some data twice - it's just the price of processing data that is not fixed and / or indexed :)
Another approach is to read the file from the end. How well this works is hard to predict, as it depends heavily on how the OS and file system are capable of handling reverse reads. In some cases, this is as good as direct reading - both are sequential readings, but simply about whether OS / FS is enough to understand it or not. In some cases, it will be very expensive - if so, use large file buffers (for example, 16 MiB instead of the more usual 4 kiB, etc.), in order to compress the most consistent reading possible. Counting from the back still does not allow you to directly transfer data to another file (you will need to combine this with the first approach or save all 100,000 lines in memory immediately again), but that means you only ever read the data that you are going to use (the most readable size of your buffer).
Finally, if all else fails, you can use unmanaged memory for some of the work you do. I hope I do not need to say that this is much more complicated than using managed memory. You must be very careful about the correct addressing and border checks, among other things. For a task like yours, it is still quite controllable - ultimately, you just move a lot of bytes with very little “work”. You better understand the uncontrollable world, though - otherwise it will simply lead to errors that are very difficult to track and correct.
EDIT:
Since you have made it clear that the “last 100 thousand elements” are a workaround and not a desired solution, the easiest way is to simply transfer the data, rather than reading everything into RAM and writing everything at once. If File.Copy / File.Move not enough for you, you can use something like this:
var buffer = new byte[4096]; using (var sourceFile = File.OpenRead(...)) using (var targetFile = File.Create(...)) { var bytesRead = sourceFile.Read(buffer, 0, buffer.Length); if (bytesRead == 0) break; targetFile.Write(buffer, 0, bytesRead); }
The only memory you need is for (relatively small) buffers.