Unable to kill processes (originating from docker container)

I start a docker cluster with several thousand containers and several times a day in random order I have a process that "gets stuck", blocking the container from stopping. Below is an example of a container with its corresponding process and all the things that I tried to kill in the container / process.

Container:

# docker ps | grep 950677e2317f
950677e2317f        7e553d1d9f6f                  "/bin/sh -c /minecraf"   2 days ago          Up 2 days           0.0.0.0:22661->22661/tcp, 0.0.0.0:22661->22661/udp, 0.0.0.0:37681->37681/tcp, 0.0.0.0:37681->37681/udp                                                                                                                                                                                       gloomy_jennings

Try stopping the container using the docker daemon (it tries forever without a result):

# time docker stop --time=1 950677e2317f
^C
real    0m13.508s
user    0m0.036s
sys     0m0.008s

The demon log when trying to stop:

# journalctl -fu docker.service
-- Logs begin at Fri 2015-12-11 15:40:55 CET. --
Dec 31 23:30:33 m3561.contabo.host docker[9988]: time="2015-12-31T23:30:33.164731953+01:00" level=info msg="POST /v1.21/containers/950677e2317f/stop?t=1"
Dec 31 23:30:34 m3561.contabo.host docker[9988]: time="2015-12-31T23:30:34.165531990+01:00" level=info msg="Container 950677e2317fcd2403ef5b5ffad37204e880136e91f76b0a8682e04a93e80942 failed to exit within 1 seconds of SIGTERM - using the force"
Dec 31 23:30:44 m3561.contabo.host docker[9988]: time="2015-12-31T23:30:44.165954266+01:00" level=info msg="Container 950677e2317f failed to exit within 10 seconds of kill - trying direct SIGKILL"

A look at the processes running on the machine shows the zombie process (pid 11991 on the main machine):

# ps aux | grep [1]1991
root     11991 84.3  0.0   5836   132 ?        R    Dec30 1300:19 bash -c (echo stop > /tmp/minecraft &)
# top -b | grep [1]1991
11991 root      20   0    5836    132     20 R  89.5  0.0   1300:29 bash

And this is really a process running inside our container (check the container identifier):

# cat /proc/11991/mountinfo
...
/var/lib/docker/containers/950677e2317fcd2403ef5b5ffad37204e880136e91f76b0a8682e04a93e80942/resolv.conf /etc/resolv.conf rw,relatime - ext4 /dev/sda2 rw,errors=remount-ro,data=ordered

Attempting to kill a process yields nothing:

# kill -9 11991
# ps aux | grep [1]1991
root     11991 84.3  0.0   5836   132 ?        R    Dec30 1303:58 bash -c (echo stop > /tmp/minecraft &)

Some overview data:

# docker version
Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:        Fri Nov 20 13:20:08 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:        Fri Nov 20 13:20:08 UTC 2015
 OS/Arch:      linux/amd64

# docker info
Containers: 189
Images: 322
Server Version: 1.9.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 700
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.0-19-generic
Operating System: Ubuntu 15.10
CPUs: 24
Total Memory: 125.8 GiB
Name: m3561.contabo.host
ID: ZM2Q:RA6Q:E4NM:5Q2Q:R7E4:BFPQ:EEVK:7MEO:YRH6:SVS6:RIHA:3I2K

# uname -a
Linux m3561.contabo.host 4.2.0-19-generic #23-Ubuntu SMP Wed Nov 11 11:39:30 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

, . - -. (, node 3-7 ), .

, ?

+4
2

, , . , GitHub.

, , , Linux 4.19+. , .

UPDATE: 3. * - . , , .

+3

, overlay2 . ( ). , aufs storage , .

0

Source: https://habr.com/ru/post/1622387/


All Articles