Should GC.Collect () be called regularly?

Question

Should GC.Collect () be called regularly?

I recently published an article about a log file failure with errors due to memory errors> Error archiving archive from archive

I used to have the opportunity to try simpler methods (when calling the log file with a date name to avoid archiving), which, obviously, would mean rewriting the method, etc., I first tried the option "Garbage collection", since I never was used it, for example, GC.Collect ().

This was placed in the first attempt / catch if a memory error was caused while trying to read the contents of the log files, and it seems to free half of the memory, for example, from the debug file used during this process (since the log file is obviously due to this, so that will help me debug after the fact). I get this answer from the archiving process of last nights.

Attempt Archive Take contents of current file with ReadFileString (this is my custom stream reader method I wrote which you can see in the original article) Taken contents of current file! In TRY/CATCH - Out of Memory Exception - Try GC.Collect() Memory used before collection: **498671500** Memory used after collection: **250841460** Try again with ReadFileString How much content have we got? Content Length is **123595955** We have content from the old log file

So GC.Collect seems to fix the file reading issue.

However, I'm just wondering HOW what kind of memory is freed from this debugging, as it removes 247.83 MB of memory when I call GC.Collect ().

Therefore, I would like to know which objects or memory are freed, because I thought that .NET should have a good built-in garbage collection, and if it makes this “free” memory over time, I should call GC.Collect () regularly every time to free up memory or this amount of memory generated only due to the fact that the first attempt to read the log file into memory failed?

Obviously, it did not work with large files for some time, until I tried to use GC.Collect, which I had never used before, so I just wonder where the memory came from when it is going in the usual way, and should it call elsewhere.

This is a Windows service application that contains a link to a DLL that calls many HTTP calls to a third-party API using JSON during the day, using multiple timers to control every task that it needs to complete. Therefore, it works continuously if I do not manually stop the service.

So I have to call GC.Collect () one night as well as in other articles that people say to leave garbage collection in the system, but from this instance it helped quickly solve a problem in which there is not enough memory (I I have a 64-bit 64-bit computer on which this works).

+5

garbage-collection c # io .net readfile

Monkey magic Apr 11 '16 at 9:33

source share

5 answers

First, make sure that you close (delete) all files, otherwise the internal file buffers will not be released until the GC detects that you forgot to close the file.

If you do not need to copy the file, just rename it (FileInfo.Rename). This is a common way to handle log files.

If you do not need to process the data, use FileInfo.CopyTo or CopyTo (Stream) , so the text will be copied using a reasonable small buffer, and memory will never be allocated to store all the text at the same time.

If you need to process the text, read one line at a time, this will result in a large number of small lines, rather than one very large line. . GC GC is very good at rebuilding a small short-lived facility. No need to specify the entire log file in memory at the same time. Creating a custom iterator that returns lines in a file will be one way to do this.

+6

Ian ringrose Apr 11 '16 at 12:57

source share

We can only guess. Since you have enough free memory (and non-contiguous virtual address space), the problem is most likely due to the inability to allocate sufficient contiguous memory. Things that need the most contiguous memory are almost exclusively arrays, such as a support array for your queue. When everything works correctly, the address space is compacted regularly (part of the GC), and you maximize the available continuous memory. If this does not work, something is preventing the seal from working properly — for example, fixed handles, for example, used for I / O.

Why does explicit GC.Collect() help? It is possible that you are at the point where all these fixed handles are released and the seal really works. Try using something like VMMap or CLRProfiler to see how objects fit in the address space - a typical case of compaction problems - when you have something like 99% free space in your memory, but nowhere else is enough to allocate a new object (strings and arrays do not work very well with memory fragmentation). Another typical case is that when allocating unmanaged memory (e.g. for buffers) you neglect to use GC.AddMemoryPressure , so the GC has no idea that it should really start collecting already. Again, CLRProfiler is very useful when viewing when the GC happens, and how it maps to memory usage.

If memory fragmentation is really a problem, you need to find out why. This is actually a somewhat complicated process, and it may take some work with something like WinDbg, which is at least quite difficult to use. I / O always means several pinned buffers, so if you perform many I / O operations in parallel, you interfere with the proper functioning of the GC. The GC tries to handle this by creating a few heaps (depending on the exact GC configuration you work in, but looking at your case, the GC server really should be what you use - you use it on Windows Server, right? ), and I saw hundreds of heaps created to “fix” the fragmentation problem, but ultimately it's doomed to failure.

If you need to use fixed pens, you really want to select them once and, if possible, reuse them. Pinning does not allow the GC to do its job, so you only need to attach material that does not need to be moved to memory (heap objects of large objects, pre-allocated buffers at the bottom of the heap ...) or at least pin as short as possible.

In general, reusing buffers is a good idea. Unfortunately, this means that you want to avoid string and similar constructs in your code like this - string immutable means that every single line you read must be a separate object. Fortunately, you do not have to deal with string in your case - a simple byte[] buffer will work just as well - just find 0x13, 0x10 instead of "\r\n" . The main problem is that you need to store a lot of data in memory at the same time - you either need to minimize this, or make sure that the buffers are allocated where they are used best; for file data, the LOH buffer will be very useful.

One way to avoid such a large number of distributions is to analyze the file looking for the end of the line and remember only the offset of the line to which you want to start copying. When you go, one by one (using the reusable byte[] buffer), you simply update the offset "no more than the 100,000th line from the end", and do not select and free the lines. Of course, this means that you need to read some data twice - it's just the price of processing data that is not fixed and / or indexed :)

Another approach is to read the file from the end. How well this works is hard to predict, as it depends heavily on how the OS and file system are capable of handling reverse reads. In some cases, this is as good as direct reading - both are sequential readings, but simply about whether OS / FS is enough to understand it or not. In some cases, it will be very expensive - if so, use large file buffers (for example, 16 MiB instead of the more usual 4 kiB, etc.), in order to compress the most consistent reading possible. Counting from the back still does not allow you to directly transfer data to another file (you will need to combine this with the first approach or save all 100,000 lines in memory immediately again), but that means you only ever read the data that you are going to use (the most readable size of your buffer).

Finally, if all else fails, you can use unmanaged memory for some of the work you do. I hope I do not need to say that this is much more complicated than using managed memory. You must be very careful about the correct addressing and border checks, among other things. For a task like yours, it is still quite controllable - ultimately, you just move a lot of bytes with very little “work”. You better understand the uncontrollable world, though - otherwise it will simply lead to errors that are very difficult to track and correct.

EDIT:

Since you have made it clear that the “last 100 thousand elements” are a workaround and not a desired solution, the easiest way is to simply transfer the data, rather than reading everything into RAM and writing everything at once. If File.Copy / File.Move not enough for you, you can use something like this:

 var buffer = new byte[4096]; using (var sourceFile = File.OpenRead(...)) using (var targetFile = File.Create(...)) { var bytesRead = sourceFile.Read(buffer, 0, buffer.Length); if (bytesRead == 0) break; targetFile.Write(buffer, 0, bytesRead); }

The only memory you need is for (relatively small) buffers.

+2

Luaan Apr 12 '16 at 11:57

source share

A sensible solution to the GC.collect call GC.collect would be to create a new MemoryFailPoint prior to the critical code section.

This, of course, does not solve the real problem of why the GC in your case did not collect memory by itself.

In your case, you know how much memory you will need (file size), so by creating a new MemoryFailPoint with this size, you can be pretty sure that the memory will be available. MemoryFailPoint actually calls GC.collect itself if it decides it is necessary, but also has some additional logic to solve other problems, such as page file size or address space fragmentation.

And if there is not enough memory, you avoid an OutOfMemoryException with its potential distorting side effects and instead get an InsufficientMemoryException that you can catch without worries.

+1

Hugoune Apr 12 '16 at 11:35

source share

On the MSDN page for OutOfMemoryException, the main reason for OutOfMemoryException :

The common language runtime cannot allocate sufficient continuous memory for the operation to succeed. This exception may be caused by any property assignment or by calling a method that requires memory allocation. For more information about the reason for the OutOfMemoryException exception, see "Out of memory" does not apply to physical memory.
This type of OutOfMemoryException is a catastrophic failure. If you decide to handle the exception, you must include a catch block that calls the Environment.FailFast method to terminate your application and add an entry to the system event log ...

The key point is that it is a catastrophic failure , and you must exit your application.

The call to GC.Collect() no longer a parameter after this type of exception has occurred.

-1

Enigmativity Apr 12 '16 at 10:26

source share

mehrdad safa · Accepted Answer · 2016-04-11T10:28:05+0000

The garbage collector typically collects managed objects that are no longer used or mentioned. Usually there is no need (or should not) to call the GC.Collect() method manually. But, for example, (in this case) when called:

  queue.Dequeue(item)...

There is no pointer or variable in the long loop pointing to the remote object, but since it is still within the scope of the method, the garbage collector does not collect it until the memory becomes very low. You can call it manually if you are in such a situation.

Should GC.Collect () be called regularly?

More articles: