Gnu C ++ library stuck in loop during vector allocation

Starting the linux kernel 3.6.6-1, gcc 4.7.2-2, the following program:

1 #include <vector> 2 using namespace std; 3 int main () 4 { 5 vector<size_t> a (1 << 24); 6 return 0; 7 } 

never returns from line 5.

when I run in gdb, I see that it is stuck in stl_algobase.h on line 743/744:

 0x000000000040101c in std::__fill_n_a<unsigned long*, unsigned long, unsigned long> (__first=0x7fffeffd8060, __n=16777216, __value=@0x7fffffffe0a8 : 0) at /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../include/c++/4.7.2/bits/stl_algobase.h:743 740 __fill_n_a(_OutputIterator __first, _Size __n, const _Tp& __value) 741 { 742 const _Tp __tmp = __value; 743 for (__decltype(__n + 0) __niter = __n; 744 __niter > 0; --__niter, ++__first) 745 *__first = __tmp; 746 return __first; 747 } 

__ niter only remains at a value of 1 and never counts to 0.

This behavior only occurs after my system has been running for a while. And when this happens, the whole system seems tattered. That is, gui soon stops responding, but I can do and do something, but in the end the whole system becomes unusable and I reboot.

After a reboot, the above program behaves as expected.

Obviously, the problem is not in my program. This is just a symptom of some larger problem.

My question is: what should I do next?

I checked all my error logs and found nothing. I don't get hardware exceptions or anything like that, so it's hard to say exactly when my system goes into this state.

I have no ideas, so any help would be greatly appreciated.

edit:

I changed my compiler options to -g -Wall and got the same result.

Here is a parsing for __fill_n_a (with new parameters):

  1 0x00000000004010bd <+0>: push %rbp 2 0x00000000004010be <+1>: mov %rsp,%rbp 3 0x00000000004010c1 <+4>: mov %rdi,-0x18(%rbp) 4 0x00000000004010c5 <+8>: mov %rsi,-0x20(%rbp) 5 0x00000000004010c9 <+12>: mov %rdx,-0x28(%rbp) 6 0x00000000004010cd <+16>: mov -0x28(%rbp),%rax 7 0x00000000004010d1 <+20>: mov (%rax),%rax 8 0x00000000004010d4 <+23>: mov %rax,-0x10(%rbp) 9 0x00000000004010d8 <+27>: mov -0x20(%rbp),%rax 10 0x00000000004010dc <+31>: mov %rax,-0x8(%rbp) 11 0x00000000004010e0 <+35>: jmp 0x4010f7 <std::__fill_n_a<unsigned long*, unsigned long, unsigned long>(unsigned long*, unsigned long, unsigned long const&)+58> 12 0x00000000004010e2 <+37>: mov -0x18(%rbp),%rax 13 0x00000000004010e6 <+41>: mov -0x10(%rbp),%rdx 14 0x00000000004010ea <+45>: mov %rdx,(%rax) 15 0x00000000004010ed <+48>: subq $0x1,-0x8(%rbp) 16 0x00000000004010f2 <+53>: addq $0x8,-0x18(%rbp) 17 0x00000000004010f7 <+58>: cmpq $0x0,-0x8(%rbp) 18 0x00000000004010fc <+63>: setne %al 19 0x00000000004010ff <+66>: test %al,%al 20 0x0000000000401101 <+68>: jne 0x4010e2 <std::__fill_n_a<unsigned long*, unsigned long, unsigned long>(unsigned long*, unsigned long, unsigned long const&)+37> 21 0x0000000000401103 <+70>: mov -0x18(%rbp),%rax 22 0x0000000000401107 <+74>: pop %rbp 23 0x0000000000401108 <+75>: retq 

I also run my diagnostic tool for system memory without errors and, as DL suggested, ran memtest86 without errors.

edit:

I confirmed that this is not a hardware problem by running the same code on another machine. The same kernel and compiler software is installed on another machine, and it does not work in the same way.

I am suspicious of ImageMagick. This seems to only happen after I run scripts that do a lot of ImageMagick conversions. Previously, I had problems with ImageMagick and I had to set the shell variable MAGICK_THREAD_LIMIT = 1.

+4
source share
1 answer

The general symptoms that you describe sound like a lack of memory. If the use of system memory is not considered so high, this may be due to some kind of RAM problem, as commentators noted.

You speak:

__ niter only remains at a value of 1 and never counts to 0.

but this is not entirely clear - __niter should start from 16777216 and count to 0. If you accidentally broke into this program, it would almost certainly be in this loop, but the __niter value would almost certainly not be 1, and if you go through the loop , it will be like a loop. I really doubt the debugging information released by gcc 4.7 (actually, this is a pretty problem with gcc 4.0), since gdb often prints incorrect values ​​for local variables, but if you check the code and look at both memory / registers, you can see the correct value. If this happens here, your problem probably has nothing to do with this program; its system instability (possibly due to a hardware problem), which manifests itself as things that hang, for example, this program. Considering what this program does, a freeze probably occurs when it touches a previously untouched page (receiving a page error), and the kernel tries to highlight the page. This indicates a memory problem, but you notice that you have already passed the memory diagnostic. Also make sure that you have nothing overclocked or that the specification ends otherwise.

+2
source

Source: https://habr.com/ru/post/1446490/


All Articles