How to detect invalid fd / handle

I have a server application that handles network clients with asynchronous I / O. Client connections are accepted and then added to the descriptor set, which can be monitored using poll / epoll / select / etc. I use the apr_pollset_poll () call of apache APR to check descriptors that can be read or written. This uses epoll / poll / select / etc internally depending on the platform.

The problem is that somehow one of the socket descriptors becomes damaged, and apr_pollset_poll returns errno 10038, which is WSAENOTSOCK: the operation was attempted for something that is not a socket. Unfortunately, this leads to the fact that my application stops working at all, and not just deletes this particular client connection. If I could somehow ignore or remove this socket from the set of descriptors, it could continue to function and read / write other sockets correctly. I know that I have to find the main reason why the socket becomes damaged, but I need to provide a fail-safe workaround.

Once the descriptors are added to the survey, they are then processed by the OS / kernel, and I see no way to return them for repetition. Maintaining them also in my own list would probably create other problems further, because when I closed the socket, I would need to clear them anyway, which happens automatically for inclusion in the kernel.

Any suggestions?

+4
source share
2 answers

That sounds awful, but it's an emergency when this happens. Therefore, I suggest looking through all the descriptors in your working survey and try to perform an operation on this descriptor, which will cause this error if the descriptor is fictitious. For example, you can create a new temporary poll and try to perform a non-blocking zero timeout poll and see if you can get an error.

If you have more than, say, a dozen descriptors in your survey, you might consider binary search instead of a one-on-one approach. You can add half of your descriptors to a temporary poll, and then perform the operation. If it fails, you know that you have a fake handle in the set that you tried; divide in half and try again; if this does not fail, you can assume that the dummy descriptor is in a different set, and you can either check that the other half fails, or assume that this will happen, and divide the remainder by two and try again. Continue until you isolate one failed descriptor. Clearly, if you have several dummy descriptors, and not just one, you may have to repeat this process several times.

With a single descriptor isolated, you can decide what you need to do and how. And if / when the problem recurs, you can repeat the isolation process. Clearly, you would not try this if you did not find the problem in the first place. But when something goes wrong, you need to isolate the problem, and that (should) have to achieve this.

+2
source

It turned out that I was doing close () in a socket descriptor that was polled in another thread, and the select () based pollset implementation did not like it. On the other hand, it would be possible to modify the apr library code to return a descriptor when it chooses to detect an invalid socket, or it may even delete it automatically.

0
source

Source: https://habr.com/ru/post/1382303/


All Articles