Tips for debugging segmentation errors in the absence of leaks

Question

Tips for debugging segmentation errors in the absence of leaks

I wrote a C application that works fine, except for very large datasets as input.

With a lot of input, I get a segmentation error at the ends of the binary function.

I ran the binary (with test input) using valgrind :

 valgrind --tool=memcheck --leak-check=yes /foo/bar/baz inputDataset > outputAnalysis

This work usually takes several hours, but it took seven days with valgrind .

Unfortunately, at the moment I do not know how to read the results that I get from this run.

I get a lot of these warnings:

 ... ==4074== Conditional jump or move depends on uninitialised value(s) ==4074== at 0x435900: ??? (in /foo/bar/baz) ==4074== by 0x439CC5: ??? (in /foo/bar/baz) ==4074== by 0x400BF2: ??? (in /foo/bar/baz) ==4074== by 0x402086: ??? (in /foo/bar/baz) ==4074== by 0x402A0F: ??? (in /foo/bar/baz) ==4074== by 0x41684F: ??? (in /foo/bar/baz) ==4074== by 0x4001B8: ??? (in /foo/bar/baz) ==4074== by 0x7FEFFFF57: ??? ==4074== Uninitialised value was created ==4074== at 0x461D3A: ??? (in /foo/bar/baz) ==4074== by 0x43F926: ??? (in /foo/bar/baz) ==4074== by 0x416B9B: ??? (in /foo/bar/baz) ==4074== by 0x416725: ??? (in /foo/bar/baz) ==4074== by 0x4001B8: ??? (in /foo/bar/baz) ==4074== by 0x7FEFFFF57: ??? ...

There are no hinted pieces of code, no variable names, etc. What can I do with this information?

In the end, I finally get the following error, but - as with smaller datasets that don't crash - valgrind does not detect leaks:

 ... ==4074== Process terminating with default action of signal 11 (SIGSEGV) ==4074== Access not within mapped region at address 0x7158E7F7 ==4074== at 0x7158E7F7: ??? ==4074== by 0x4020B8: ??? (in /foo/bar/baz) ==4074== by 0x6322203A22656D6E: ??? ==4074== by 0x306C675F6E557267: ??? ==4074== by 0x202C22373232302F: ??? ==4074== by 0x6D616E656C696621: ??? ==4074== by 0x72686322203A2264: ??? ==4074== by 0x3030306C675F6E54: ??? ==4074== by 0x346469702E373231: ??? ==4074== by 0x646469662E34372F: ??? ==4074== by 0x722E64616568656B: ??? ==4074== by 0x63656D6F6C756764: ??? ==4074== If you believe this happened as a result of a stack ==4074== overflow in your program main thread (unlikely but ==4074== possible), you can try to increase the size of the ==4074== main thread stack using the --main-stacksize= flag. ==4074== The main thread stack size used in this run was 10485760. ==4074== ==4074== HEAP SUMMARY: ==4074== in use at exit: 0 bytes in 0 blocks ==4074== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==4074== ==4074== All heap blocks were freed -- no leaks are possible ==4074== ==4074== For counts of detected and suppressed errors, rerun with: -v ==4074== ERROR SUMMARY: 1603141870 errors from 86 contexts (suppressed: 0 from 0) Segmentation fault

All that I allocate for space gets the equivalent free operator, after which I set the pointers to NULL .

At this point, what is the best way to debug this application to determine what else causes the segmentation error?

December 22, 2011 - Change

I compiled a debug version of my binary called debug-binary using the following compilation flags:

 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE=1 -DUSE_ZLIB -g -O0 -Wformat -Wall -pedantic -std=gnu99

When I run it with valgrind , I don't get much more information:

 valgrind -v --tool=memcheck --leak-check=yes --error-limit=no --track-origins=yes debug-binary input > output

Here is a snippet of output:

 ==25116== 2 errors in context 14 of 14: ==25116== Invalid read of size 4 ==25116== at 0x4045E8: ??? (in /foo/bar/debug-binary) ==25116== by 0x40682F: ??? (in /foo/bar/debug-binary) ==25116== by 0x404F0C: ??? (in /foo/bar/debug-binary) ==25116== by 0x401FA4: ??? (in /foo/bar/debug-binary) ==25116== by 0x402016: ??? (in /foo/bar/debug-binary) ==25116== by 0x403B27: ??? (in /foo/bar/debug-binary) ==25116== by 0x40295E: ??? (in /foo/bar/debug-binary) ==25116== by 0x31A021D993: (below main) (in /lib64/libc-2.5.so) ==25116== Address 0x539f188 is 24 bytes inside a block of size 48 free'd ==25116== at 0x4A05D21: free (vg_replace_malloc.c:325) ==25116== by 0x401F6B: ??? (in /foo/bar/debug-binary) ==25116== by 0x402016: ??? (in /foo/bar/debug-binary) ==25116== by 0x403B27: ??? (in /foo/bar/debug-binary) ==25116== by 0x40295E: ??? (in /foo/bar/debug-binary) ==25116== by 0x31A021D993: (below main) (in /lib64/libc-2.5.so)

Is it a problem with my binary or with the system library ( libc ) that my application depends on?

I also do not know what to do with the interpretation of the entries ??? . Is there another compilation flag to get valgrind to provide additional information?

+4

c memory-management debugging segmentation-fault valgrind

Alex reynolds Dec 19 '11 at 10:19

source share

3 answers

wallyk · Answer 1 · 2011-12-19T22:30:56+0000

Valgrind basically says there are no noticeable heap management issues. The program was interrupted due to a less complex programming error.

If it were me, I would

compile it with gcc -g ,
include kernel dump files ( ulimit -c unlimited ),
usually run the program,
and make a mistake
use gdb to check the main file and see what it does when it crashes:
gdb (program file) (kernel file)
B

derobert · Answer 2 · 2011-12-19T22:45:01+0000

I do not believe that valgrind can find all the errors in which you typed a value on the stack (but do not overload the stack itself). So you can try gcc -f-stack-protector-all .

You should also try mudflap using -fmudflap (single-threaded) or -fmudflapth (multi-threaded).

Both dirt and stack protections should be much faster than valgrind.

In addition, it seems that you do not have debugging symbols, which makes it difficult to read tracks. Add -ggdb . You might also want to enable main file generation (try ulimit -c unlimited ). Thus, you can try to debug the process after a crash using gdb program core .

As @wallyk points out, your segfault can be pretty easy to find, for example, maybe you are casting NULL, and gdb might point to the exact line (or, well, close if you are not compiling with -O0 ). That would make sense, for example, if you just use memory for your large datasets, and thus malloc returns NULL, and you forgot to check it somewhere.

Finally, if nothing else makes sense, there are always problems with the equipment. But it is expected that they will be quite random, for example, different values that will be damaged by different runs. If you try another machine and it happens there, it is highly unlikely to be a hardware problem.

caf · Answer 3 · 2011-12-19T23:24:49+0000

"Conditional jump or move depends on uninitialized value" is a serious error that you must correct. This indicates that the behavior of your program is affected by the contents of the uninitialized variable (including the uninitialized memory area returned by malloc() ).

To get readable backtraces from valgrind, you need to compile with -g .

Tips for debugging segmentation errors in the absence of leaks

More articles: