XIO: Fatal I / O Error 11

Yes, this question was asked before, but reading the answers didn’t really illuminate me.

I wrote a C program that fires after several days of use. The important point is that it does NOT generate the main file, even if everything is configured so that it (core_pattern, ulimit -c is unlimited, etc. I can start the kernel with a dump with kill -SIGQUIT).

Programs actively log what he does, but there is no hint of a failure in the log. The only message displayed on failure (or before?):

XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0" after 2322 requests (2322 known processed) with 0 events remaining. 

So, two questions: - how is it possible for the program to crash (return $? = 1) without a core dump. - What is this error message about and what can I do?

RedHat Enterprise 6.4 System

Edit: I managed to get the main dump to call abort () from within the atexit () callback:

 (gdb) bt #0 0x00bc8424 in __kernel_vsyscall () #1 0x0085a861 in raise () from /lib/libc.so.6 #2 0x0085c13a in abort () from /lib/libc.so.6 #3 0x0808f5cf in Unexpected () at MyCode.c:1378 #4 0x0085de9f in exit () from /lib/libc.so.6 #5 0x00c85701 in _XDefaultIOError () from /usr/lib/libX11.so.6 #6 0x00c85797 in _XIOError () from /usr/lib/libX11.so.6 #7 0x00c84055 in _XReply () from /usr/lib/libX11.so.6 #8 0x00c68b8f in XGetImage () from /usr/lib/libX11.so.6 #9 0x004fd6a7 in ?? () from /usr/local/lib/libcvi.so #10 0x00478ad5 in ?? () from /usr/local/lib/libcvi.so ... #29 0x001eed9d in ?? () from /usr/local/lib/libcvi.so #30 0x001eee41 in RunUserInterface () from /usr/local/lib/libcvi.so #31 0x0808fab4 in main (argc=2, argv=0xbfbdc984) at MyCode.c:1540 

Can anyone tell me about this X11 issue? libcvi.so is not mine, only MyCode.c (LabWindows / CVI).

Edit 2014-12-05: Here's an even clearer return line. Things definitely happen in X11, but I'm not an X11 programmer, so looking at the source code for X from linestell provided to me only that the X server (?) Is temporarily unavailable. Is there a way to just tell him to ignore this error if it is temporary?

 #4 0x00965eaf in __run_exit_handlers (status=1) at exit.c:78 #5 exit (status=1) at exit.c:100 #6 0x00c356b1 in _XDefaultIOError (dpy=0x88aeb80) at XlibInt.c:1292 #7 0x00c35747 in _XIOError (dpy=0x88aeb80) at XlibInt.c:1498 #8 0x00c340a6 in _XReply (dpy=0x88aeb80, rep=0xbf82fa90, extra=0, discard=0) at xcb_io.c:708 #9 0x00c18c0f in XGetImage (dpy=0x88aeb80, d=27263845, x=0, y=0, width=60, height=20, plane_mask=4294967295, format=2) at GetImage.c:75 #10 0x005f46a7 in ?? () from /usr/local/lib/libcvi.so 

Matching lines:

 XlibInt.c: _XDefaultIOError() 1292: exit(1); XlibInt.c: _XIOError 1498: _XDefaultIOError(dpy); xcb_io.c: _XReply() 708: if(!reply) _XIOError(dpy); GetImage.c: XGetImage() 74: if (_XReply (dpy, (xReply *) &rep, 0, xFalse) == 0 || ... 
+6
source share
2 answers

OK, I finally found the reason (thanks to someone at National Instruments), better diagnostics and workaround.

The error in many versions of libxcb is a 32-bit counter rollover problem that has been known for several years: https://bugs.freedesktop.org/show_bug.cgi?id=71338

Not all versions of libxcb are affected by libxcb-1.9-5, but libxcb-1.5-1 is not. From the list of errors, the 64-bit OS should not be affected, but I managed to run it on at least one version.

This leads me to a better diagnosis. The following program will fail in less than 15 minutes in the affected libraries (better than the entire week that it previously took):

 // Compile with: gcc test.c -lX11 && time ./a.out #include <X11/Xlib.h> void main(void) { Display *d = XOpenDisplay(NULL); if (d) for(;;) XNoOp(d); } 

And one last thing, the above program, compiled and working on a 64-bit system, works fine, compiled and works on the old 32-bit system, also works fine, but if I transfer the 32-bit version to a 64-bit system, it will work in a few minutes.

+5
source

I just had a program that acted exactly the same, with exactly the same error message. I would expect a counter error to handle 2 ^ 32 events before the failure.

The program was structured so that the worker thread has a separate X connection to thread X so that it can send messages to thread X to update the window.

In the end, I traced the problem to the place where the function sending events to the window to redraw it was called by several threads without a mutex on it, and since X is not overwritten to the same X connection, the applicant encountered this exact error. Put the mutex on the function and no problem since.

+1
source

Source: https://habr.com/ru/post/975140/


All Articles