Ruby OOM in container

We recently encountered a problem with Ruby inside a Docker container. Despite the rather low load, the application tends to consume huge amounts of memory and after a while under the mentioned load it is OOMs.

After some research, we narrowed down the problem to a one-line

docker run -ti -m 209715200 ruby:2.1 ruby -e 'while true do array = []; 3000000.times do array << "hey" end; puts array.length; end;' 

On some OOMed machines (the oom-killer was killed because the limit was exceeded) shortly after the start, but on some it worked, albeit slowly, without OOM. It seems that (it only seems, maybe this is not so) in some configurations ruby โ€‹โ€‹is able to display the limits of cgroup and configure GC.

Checked configurations:

  • CentOS 7, Docker 1.9 - OOM
  • CentOS 7, Docker 1.12 - OOM
  • Ubuntu 14.10, Docker 1.9 - OOM
  • Ubuntu 14.10, Docker 1.12 - OOM
  • MacOS X Docker 1.12 - No OOM
  • Fedora 23 Docker 1.12 - No OOM

If you look at the memory consumption of the ruby โ€‹โ€‹process, in all cases it acted similarly to this picture, remaining at the same level just below the limit, or crashed into the limit and was killed.

Memory consumption graph

We want to avoid OOM at all costs, as this reduces fault tolerance and creates the risk of data loss. The memory really needed for the application is below the limit.

Do you have any suggestions on what to do with ruby โ€‹โ€‹to avoid OOMing, possibly losing performance?

We cannot understand what are the significant differences between the tested installations.

Edit: changing the code or increasing the memory limit is not available. The first one is because we run with community plugins that we do not control with, the second because it does not guarantee that we will not encounter this problem again in the future.

+6
source share
3 answers

You can try to configure ruby garbage collection through environment variables (depending on your ruby โ€‹โ€‹version):

 RUBY_GC_MALLOC_LIMIT=4000100 RUBY_GC_MALLOC_LIMIT_MAX=16000100 RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR=1.1 

Or call garbage collection manually through GC.start

Try as an example

docker run -ti -m 209715200 ruby:2.1 ruby -e 'while true do array = []; 3000000.times do array << "hey" end; puts array.length; array = nil; end;'

to help the garbage collector.

Edit:

I do not have a comparable environment for yours. On my machine (14.04.5 LTS, docker 1.12.3, 4GB RAM, Intel (R) Core (TM) i5-3337U CPU @ 1.80 GHz) looks pretty promising.

 docker run -ti -m 500MB -e "RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR=1" \ -e "RUBY_GC_MALLOC_LIMIT=5242880" \ -e "RUBY_GC_MALLOC_LIMIT_MAX=16000100" \ -e "RUBY_GC_HEAP_INIT_SLOTS=500000" \ ruby:2.1 ruby -e 'while true do array = []; 3000000.times do array << "hey" end; puts array.length; puts `ps -o rss -p #{Process::pid}`.chomp.split("\n").last.strip.to_i / 1024.0 / 1024 ; puts GC.stat; end;' 

But for each ruby โ€‹โ€‹application, you need a different setting for fine tuning, and if you experience memory leaks, you are lost.

+1
source

I do not think this is a dockers problem. You are abusing container resources, and Ruby tends to misbehave as soon as you click memory thresholds. It may be GC, but if another process tries to take some memory, or Ruby tries to allocate again while you are exceeded, the kernel (usually) will kill the process with the most memory. If you are concerned about memory usage on the server, add threshold alerts on 80% of RAM and allocate resources of the appropriate size for the job. When you start setting thresholds, allocate more RAM or look at specific parameters / job allocations to see if it needs to be redesigned to have a lower size.

Another potential option if you really want to have a good fixed memory bandwidth for the GC is to use JRuby and set the maximum JVM memory to leave little room for maneuver in the container memory. The JVM manages OOM better in its own context, because it does not transfer these resources to other processes and does not allow the kernel to think that the server is dying.

+2
source

I had a similar problem with multiple java-based Docker containers that ran on the same Docker host. The problem was that each container saw the shared available memory of the host machine and assumed that it could use all this memory for itself. It did not start GC very often, and I got exceptions from memory. I ended up manually limiting the amount of memory that each container can use, and I no longer have OOM. Inside the contianer, I also limited JVM memory.

Not sure if this is the problem you are seeing, but it may be related.

https://docs.docker.com/engine/reference/run/#/runtime-constraints-on-resources

+2
source

Source: https://habr.com/ru/post/1011725/


All Articles