I use the distributed continuous integration tool that I wrote myself in Ruby. He uses Mike Perham's “politics” plug to distribute tasks. The policy module uses threads for the mDNS part.
From time to time, I come across a core dump that I don't understand:
*** glibc detected *** ruby: double free or corruption (fasttop): 0x086d8600 *** ======= Backtrace: ========= /lib/libc.so.6[0xb7cef494] /lib/libc.so.6[0xb7cf0b93] /lib/libc.so.6(cfree+0x6d)[0xb7cf3c7d] /usr/lib/libruby18.so.1.8[0xb7e8adf8] /usr/lib/libruby18.so.1.8(ruby_xmalloc+0x85)[0xb7e8b395] /usr/lib/libruby18.so.1.8[0xb7e5065e] ... /usr/lib/libruby18.so.1.8[0xb7e717f4] /usr/lib/libruby18.so.1.8[0xb7e74296] /usr/lib/libruby18.so.1.8(rb_yield+0x27)[0xb7e7fb57] ======= Memory map: ======== ...
I am running on Gentoo and rebuilding Ruby and Glibc with "-gdbg" and disabled striping to get a meaningful kernel:
... Core was generated by `ruby /home/develop/dcc/bin/dcc-worker'. Program terminated with signal 6, Aborted.
Hmm ??? For me, it looks like a completely Ruby intern. In other issues with double freedom or corruption here in stackoverflow, I saw that maybe threads are part of the problem.
Also, the problem does not occur in exactly the same position. I have another garbage_collect , which is much longer, but the crash is also in garbage_collect , but with a slightly different path:
(gdb) bt #0 0xffffe430 in __kernel_vsyscall () #1 0xf7c8b8c0 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #2 0xf7c8d1f5 in *__GI_abort () at abort.c:88 #3 0xf7cc7e35 in __libc_message (do_abort=2, fmt=0xf7d8daa8 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:170 #4 0xf7ccdd24 in malloc_printerr (action=2, str=0xf7d8dbec "double free or corruption (fasttop)", ptr=0x911f5d0) at malloc.c:6197 #5 0xf7ccf403 in _int_free (av=0xf7daa380, p=0x911f5c8) at malloc.c:4750 #6 0xf7cd24ad in *__GI___libc_free (mem=0x911f5d0) at malloc.c:3716 #7 0xf7e68768 in obj_free () at gc.c:1366 #8 gc_sweep () at gc.c:1174 #9 garbage_collect () at gc.c:1524 #10 0xf7e68be5 in rb_newobj () at gc.c:436 #11 0xf7eb9840 in str_alloc (klass=0) at string.c:67 ... (150 lines of rb_eval/call/yield etc.)
Does anyone have a suggestion on how to isolate and possibly solve this problem?
source share