Ruby / Glibc coredump (double loose or damage)

Question

Ruby / Glibc coredump (double loose or damage)

I use the distributed continuous integration tool that I wrote myself in Ruby. He uses Mike Perham's “politics” plug to distribute tasks. The policy module uses threads for the mDNS part.

From time to time, I come across a core dump that I don't understand:

*** glibc detected *** ruby: double free or corruption (fasttop): 0x086d8600 *** ======= Backtrace: ========= /lib/libc.so.6[0xb7cef494] /lib/libc.so.6[0xb7cf0b93] /lib/libc.so.6(cfree+0x6d)[0xb7cf3c7d] /usr/lib/libruby18.so.1.8[0xb7e8adf8] /usr/lib/libruby18.so.1.8(ruby_xmalloc+0x85)[0xb7e8b395] /usr/lib/libruby18.so.1.8[0xb7e5065e] ... /usr/lib/libruby18.so.1.8[0xb7e717f4] /usr/lib/libruby18.so.1.8[0xb7e74296] /usr/lib/libruby18.so.1.8(rb_yield+0x27)[0xb7e7fb57] ======= Memory map: ======== ...

I am running on Gentoo and rebuilding Ruby and Glibc with "-gdbg" and disabled striping to get a meaningful kernel:

 ... Core was generated by `ruby /home/develop/dcc/bin/dcc-worker'. Program terminated with signal 6, Aborted. #0 0xb7f20410 in __kernel_vsyscall () (gdb) bt #0 0xb7f20410 in __kernel_vsyscall () #1 0xb7cacb60 in *__GI___open_catalog (cat_name=0x6 <Address 0x6 out of bounds>, nlspath=0xbf9d6f00 " ", env_var=0x0, catalog=0x1) at open_catalog.c:237 #2 0xb7cae498 in __sigdelset (set=0x6) from /lib/libc.so.6 #3 *__GI_sigfillset (set=0x6) at ../signal/sigfillset.c:42 #4 0xb7ce952d in freopen64 (filename=0x2 <Address 0x2 out of bounds>, mode=0xb7db02c8 "\" total=\"%zu\" count=\"%zu\"/>\n", fp=0x9) at freopen64.c:47 #5 0xb7cef494 in _IO_str_init_readonly (sf=0x86d8600, ptr=0xb7eef5a9 "te\213V\b\205\322\017\204\220", size=-1210273804) at strops.c:88 #6 0xb7cf0b93 in mALLINFo (av=0xb) at malloc.c:5865 #7 0xb7cf3c7d in __libc_calloc (n=141395456, elem_size=3214793136) at malloc.c:4019 #8 0xb7e8adf8 in ?? () at gc.c:1390 from /usr/lib/libruby18.so.1.8 #9 0x086d8600 in ?? () #10 0xb7e89400 in rb_gc_disable () at gc.c:256 #11 0xb7e8b395 in add_freelist () at gc.c:1087 #12 gc_sweep () at gc.c:1186 #13 garbage_collect () at gc.c:1524 #14 0xb7e5065e in ?? () from /usr/lib/libruby18.so.1.8 #15 0x00000340 in ?? () #16 0x00000000 in ?? () (gdb)

Hmm ??? For me, it looks like a completely Ruby intern. In other issues with double freedom or corruption here in stackoverflow, I saw that maybe threads are part of the problem.

Also, the problem does not occur in exactly the same position. I have another garbage_collect , which is much longer, but the crash is also in garbage_collect , but with a slightly different path:

 (gdb) bt #0 0xffffe430 in __kernel_vsyscall () #1 0xf7c8b8c0 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #2 0xf7c8d1f5 in *__GI_abort () at abort.c:88 #3 0xf7cc7e35 in __libc_message (do_abort=2, fmt=0xf7d8daa8 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:170 #4 0xf7ccdd24 in malloc_printerr (action=2, str=0xf7d8dbec "double free or corruption (fasttop)", ptr=0x911f5d0) at malloc.c:6197 #5 0xf7ccf403 in _int_free (av=0xf7daa380, p=0x911f5c8) at malloc.c:4750 #6 0xf7cd24ad in *__GI___libc_free (mem=0x911f5d0) at malloc.c:3716 #7 0xf7e68768 in obj_free () at gc.c:1366 #8 gc_sweep () at gc.c:1174 #9 garbage_collect () at gc.c:1524 #10 0xf7e68be5 in rb_newobj () at gc.c:436 #11 0xf7eb9840 in str_alloc (klass=0) at string.c:67 ... (150 lines of rb_eval/call/yield etc.)

Does anyone have a suggestion on how to isolate and possibly solve this problem?

+4

multithreading linux ruby coredump glibc

Tilo prütz Feb 10 '10 at 8:32

source share

4 answers

Valgrind makes it easy to find problems with a bunch of corruption. There are some false errors reported when using Ruby 1.8 under valgrind, but they can be fixed using this ruby patch (and settings using --enable-valgrind) or using the valgrind suppression file . To run the ruby program under valgrind, simply attach the valgrind command:

 valgrind ruby /home/develop/dcc/bin/dcc-worker

If the crash process is a child of the process you are using, use valgrind --trace-children=yes . Look, in particular, at invalid entries that are a sign of a heap.

+2

mark4o Feb 15 '10 at 23:18

source share

I got the same error in a simple 'C' program called rd_test; it simply reads a given number of bytes, using read (2) from a given input file (maybe a device file).

The actual error turned out to be a buffer overflow of 1 byte (as I did ... bie [n] = '\ 0'; ... where "n" is the number of bytes read in the buffer "buf"). Stupid me.

BUT, the fact is that I never caught this until I ran with valgrind! Therefore, IMHO valgrind is definitely worth working on in such cases.

The “double free or corruption” error disappeared as soon as I got rid of the abusive error.

+1

Kaiwan billimoria Jan 29 '11 at 12:39

source share

I got the same error message, but not in the ruby, but in the zenity program. I found that I had something to do when I closed the open pipe twice! Check if you are freeing up the same heap of memory two or more times by closing closed files or pipes again. Good luck

0

Eric Stockman Jul 21 '10 at 19:34

source share

Eric Warmenhoven · Accepted Answer · 2010-02-15T20:45:27+0000

Fast, easy and not so useful: export MALLOC_CHECK_=2 . This causes glibc to perform some additional level of verification during free() to avoid heap damage. It will be abort() and will give the main dump as soon as it detects damage, instead of waiting until a problem caused by the damage occurs.

Not so fast and simple, but much more useful (if you work it): valgrind .

Ruby / Glibc coredump (double loose or damage)

More articles: