I have a set of programs that work together with shared memory (ipc) ~ 48 GB.
Programs running on Linux 3.6.0-rc5, written equal to C, compiled by gcc, the average load on the host computer is 6.0, jumping to 16.0 every 10 seconds (24 cores)
One proxy server receiving data from other computers at 0 microns (3.2, ~ 1000 ms / s from 12 computers on the same network), writing to shared memory Many (<50) workers read this data and perform some calculations.
Proxy server using about 20% of the processor Each worker using 1% of the CPU jumps 10% periodically.
All programs written in this way, when all the selections made in init () - called when the program starts, all run freely in destroy () - are called before exiting
Duplicate code does not use any malloc / calloc / free at all.
But both programs are still leaking. About 120-240 bytes per minute. This is not very - the memory is exhausted in 7-8 days, and I just start / stop the process, but those leaked bytes eat my mind every time the monitoring application tells me about this restart :)
The bad things - I can't run valgrind due to the use of shared memory - it's just stopping allocating / attaching shared memory, and then everything starts to get crushed.
Trying to find this leak, I made a split version of the proxy - no leaks, but I canβt submit it with the same amount of data.
When running under gdb, there are still no leaks, but the speed drops by about 2/3 - so it may not be so quick to reproduce this error.
Thus, possible leaks are located in:
- my code. but no malloc / calloc. Just pointers + -, memcpy, memcmp
- some kind of standard library. GLibC? system log?
- 0mq when working with many sources (do not think that 1k / msgs per second is too much traffic)
Are there any other / libs / hacks tools that can help in this situation?
Edit: Sivan Raptor asked about the code. The repeating part is 5k lines of math. Without any appropriation that I mentioned.
But start, stop and repeat the entry:
int main(int argc, char **argv) { ida_init(argc, argv, PROXY); ex_pollponies(); // repetive ida_destroy(); return(0); } // with some cuttings int ex_pollponies(void) { int i, rc; unsigned char buf[90]; uint64_t fos[ROLLINGBUFFERSIZE]; uint64_t bhs[ROLLINGBUFFERSIZE]; int bfcnt = 0; uint64_t *fo; uint64_t *bh; while(1) { rc = zmq_poll(ex_in->poll_items, ex_in->count, EX_POLL_TIMEOUT); for (i=0; i < ex_in->count; i++) { if (ex_in->poll_items[i].revents & ZMQ_POLLIN) { if (zmq_recv(ex_in->poll_items[i].socket, &buf, max_size, 0) == 0) continue; fo = &fos[bfcnt]; bh = &bhs[bfcnt]; bfcnt++; if (bfcnt >= ROLLINGBUFFERSIZE) bfcnt = 0; memcpy(fo, (void *)&buf[1], sizeof(FRAMEOBJECT)); memcpy(bh, &buf[sizeof(FRAMEOBJECT)+1], sizeof(FRAMEHASH)); // then store fo, bh into shared memory, with some adjusting and checkings // storing every second around 1000 msgs 16 bytes each. But leaking is only 200 bytes per minute. } } } }
edit2:
I will finally do the work of valgrind - just using a piece of data (6 GB) and it finally passed. And do not detect leaks. But as it works, it takes a 100% processor and definitely my program does not process all incoming data - it does not work at full load. This half confirmed my guess about lasthope - a leak occurs on the data exchange unit. I find mtrace information (part of libc) This helped me track ADDRESS leaks - outside of my code, in one of the threads. The only threads in my code are zeromq. Then I start playing with options for sockets (increasing hwm, buffers), and the leak rate decreases, but it does not completely go away even at absurdly large values: (
So, now I'm 95% sure that its zeromq is leaking. Try to find the answer on your mailing list.