RabbitMQ 3.6.5 crash with high memory usage

Our cluster consists of 3 disk nodes in HA. All nodes have 4CPUx26Gig RAM. We are using RabbitMQ 3.6.5, with Erlang 17.3. The only plugin included is the management interface plugin.

The problem is that usually within 3 hours one of the servers (usually one with most queues) will gradually fill up the memory until the server crashes. This happens daily, and we see no reason for this in the magazines.

Attached logs for the server, when they add up to 21 GB of memory, and at this moment when viewing the overview panel in the management interface - this shows that only 2 GB is used. When this happens, we usually have ~ 400 connections, with ~ 470 channels, 16 exchanges, 54 queues, and ~ 300 users. One of the queues is TTL, 4 are priority queues, and all queues are long-lived.

When the service is restarted, everything returns to normal.

Any ideas as to what causes it / How should we approach debugging? Checklist of known issues to rule out?

Status of node 'rabbit@scraped-node-name' ...
[{pid,399},
 {running_applications,
     [{rabbitmq_management,"RabbitMQ Management Console","3.6.5"},
      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.5"},
      {webmachine,"webmachine","1.10.3"},
      {mochiweb,"MochiMedia Web Server","2.13.1"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.5"},
      {rabbit,"RabbitMQ","3.6.5"},
      {os_mon,"CPO  CXC 138 46","2.3"},
      {amqp_client,"RabbitMQ AMQP Client","3.6.5"},
      {rabbit_common,[],"3.6.5"},
      {mnesia,"MNESIA  CXC 138 12","4.12.3"},
      {ssl,"Erlang/OTP SSL application","5.3.6"},
      {public_key,"Public key infrastructure","0.22.1"},
      {crypto,"CRYPTO","3.4.1"},
      {inets,"INETS  CXC 138 49","5.10.3"},
      {compiler,"ERTS  CXC 138 10","5.0.2"},
      {xmerl,"XML parser","1.3.7"},
      {syntax_tools,"Syntax tools","1.6.16"},
      {asn1,"The Erlang ASN1 compiler version 3.0.2","3.0.2"},
      {ranch,"Socket acceptor pool for TCP protocols.","1.2.1"},
      {sasl,"SASL  CXC 138 11","2.4.1"},
      {stdlib,"ERTS  CXC 138 10","2.2"},
      {kernel,"ERTS  CXC 138 10","3.0.3"}]},
 {os,{unix,linux}},
 {erlang_version,
     "Erlang/OTP 17 [erts-6.2] [source] [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true]\n"},
 {memory,
     [{total,2049513496},
      {connection_readers,10231416},
      {connection_writers,3215768},
      {connection_channels,35753016},
      {connection_other,14065960},
      {queue_procs,430585272},
      {queue_slave_procs,34912},
      {plugins,525312},
      {other_proc,33015816},
      {mnesia,333080},
      {mgmt_db,33680},
      {msg_index,38121640},
      {other_ets,6595504},
      {binary,1432921304},
      {code,27606184},
      {atom,992409},
      {other_system,15482223}]},
 {alarms,[]},
 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,10983032422},
 {disk_free_limit,50000000},
 {disk_free,148079415296},
 {file_descriptors,
     [{total_limit,32668},
      {total_used,440},
      {sockets_limit,29399},
      {sockets_used,418}]},
 {processes,[{limit,1048576},{used,4683}]},
 {run_queue,0},
 {uptime,117601},
 {kernel,{net_ticktime,60}}]
+4
source share

Source: https://habr.com/ru/post/1660011/


All Articles