How to get control of a bunch of 5 GB in Haskell?

I am currently experimenting with a small Haskell web server written in Snap that downloads and provides a lot of data to the client. And it is very difficult for me to gain control over the server process. In random moments, the process uses a lot of CPU for several seconds to minutes and becomes immune to client requests. Sometimes memory usage accumulates (and sometimes drops) in hundreds of megabytes for several seconds.

Hopefully someone has more experience with long-running Haskell processes that use a lot of memory and can give me some pointers to make this thing more stable. I have been debugging this thing for several days, and I'm starting to get a little desperate here.

A small overview of my installation:

  • When starting the server, I read about 5 gigabytes of data into a large (nested) Data.Map structure in memory. A nested map is a strict value, and all values ​​inside the map are data types, and all their fields are also strict. I spent a lot of time not avoiding the unpaid thunders. Import (depending on system load) takes about 5-30 minutes. The strange thing is that the fluctuation in successive runs is much larger than I would expect, but this is another problem.

  • A large data structure lives inside "TVar", which is shared by all client flows generated by the Snap server. Clients can request arbitrary pieces of data using a small query language. The volume of a data request is usually small (up to 300 kB or so) and applies only to a small part of the data structure. All read-only requests are executed using "readTVarIO", so they do not require any STM transactions.

  • The server starts with the following flags: + RTS -N -I0 -qg -qb. This will start the server in multithreaded mode, disable downtime and parallel GC. This seems to speed up the process.

The server works without problems. However, from time to time the client request expires, and the processor reaches 100% (or even more than 100%) and continues to do so for a long time. Meanwhile, the server no longer responds to the request.

There are several reasons why I can think of what might lead to CPU usage:

  • The request is time consuming because there is a lot of work to be done. This is somewhat unlikely, because sometimes this happens for queries that turned out to be very fast in previous runs (with fast, I mean 20-80 ms or so).

  • There are still some unappreciated thunks that need to be calculated before the data can be processed and sent to the client. This is also unlikely for the same reason as the previous point.

  • Somehow garbage collection starts and starts scanning my entire heap for 5 GB. I can imagine that this can take a long time.

The problem is that I do not know how to correctly understand what is happening and what to do about it. Since the import process takes so long, the profiling results do not show me anything useful. It seems that there is no way to conditionally turn the profiler on and off from the code.

I personally suspect that the GC problem is here. I am using GHC7, which seems to have many features to customize how GC works.

What GC settings do you recommend when using large heaps with very stable data?

+46
performance garbage-collection memory-management haskell
Jul 08 2018-11-11T00:
source share
2 answers

The large memory usage and occasional CPU spikes will almost certainly be used in the GC. You can verify that this is true by using RTS options such as -B , which causes the GHC to crash whenever there is a large collection, -t , which will tell you statistics after the fact (in particular, see if the GC time is valid very long) or -Dg , which includes debugging information for GC calls (although you need to compile with -debug ).

There are several things you can do to alleviate this problem:

  • When initially importing data, the GHC spends a lot of time on heap growth. You can tell it to capture all the memory you need right away by specifying a large -H .

  • A large pile with stable data will be upgraded to the old generation. If you increase the number of generations with -G , you can get stable data in the oldest, very rare generation of GC'd, while you have more traditional young and old heaps above it.

  • Depending on memory usage in the rest of the application, you can use -F to adjust how much GHC will allow the old generation to grow before reassembling it. You can configure this option to make this garbage unnecessary.

  • If there is no record, and you have a clearly defined interface, it may be advisable to make this memory not GHC-managed (use C FFI) so that it is not possible to get super-GC ever.

These are all assumptions, so please test your application.

+29
Jul 08 '11 at 12:45
source share
— -

I had a very similar problem with a bunch of 1.5 GB of nested Maps. With the GC idle turned on by default, I would get 3-4 seconds of freezing on each GC, and with the GC idle off (+ RTS -I0) I would get 17 seconds of freezing after several hundred requests, resulting in the client’s departure time.

My "solution" was the first to increase the client's waiting time and ask people to tolerate this, while 98% of the requests were about 500 ms, about 2% of the requests would be slow. However, wanting a better solution, I ended up working with two load-balanced servers and disconnected them from the cluster to run the GC every 200 requests, and then returned to action.

Adding insult to injury, it was a rework of the original Python program that never had such problems. In fairness, we have achieved 40% performance improvement, dead-simple parallelization and a more stable code base. But this annoying GC problem ...

+2
04 Oct '13 at
source share



All Articles