How to make an epoll transition between multiple connections?

I use epoll in what, in my opinion, is a typical way for TCP sockets (mostly on in this example , but slightly adapted to C ++); one main listening socket associated with the port, and each new connection socket (from accept ()) is also added for warnings when it is ready for recv (). I created a test script that basically clogs it with connections and sends / receives. When any one client is connected, it will work flawlessly, endlessly.

However, adding a second concurrent test client will cause one of them to freeze and crash. A couple of days after debugging, I decided that I just spat out the identifier of the socket with which it works, to the file, and I'm puzzled by what I found.

When one script starts, I get only the stream, in this case 6. However, when the second script starts, I get the stream 7. Just 7. And it stays at 7, interacting exclusively with the second client, completely ignoring the first until the first reaches its timeout and does not close. (Then, when client 2 connects again, it gets ID 6 instead.)

It is worth noting that in this test the script does not use a permanent connection, it disconnects and reconnects after several messages back and forth (for more accurate modeling). But even through this, client 1 is ignored. If I set the timeout high enough so that client 2 has time to exit, it still will not resume with client 1, because everything that was expecting just something is lost.

Is this normal behavior because epoll (or sockets in general) completely abandons the previous task when a new one arises? Is there any parameter that I should specify?

EDIT: This is the same code as I can show; I do not necessarily expect that "this is what you did wrong," but rather, "these are some things that will break / fix a similar situation."

#define EVENTMODE (EPOLLIN | EPOLLET | EPOLLRDHUP | EPOLLHUP) #define ERRCHECK (EPOLLERR | EPOLLHUP | EPOLLRDHUP) //Setup event buffer: struct epoll_event* events = (epoll_event*)calloc(maxEventCount, sizeof(event)); //Setup done, main processing loop: int iter, eventCount; while (1) { //Wait for events indefinitely: eventCount = epoll_wait(pollID, events, maxEventCount, -1); if (eventCount < 0) { syslog(LOG_ERR, "Poll checking error, continuing..."); continue; } for (iter = 0; iter<eventCount; ++iter) { int currFD = events[iter].data.fd; cout << "Working with " << events[iter].data.fd << endl; if (events[iter].events & ERRCHECK) { //Error or hangup: cout << "Closing " << events[iter].data.fd << endl; close(events[iter].data.fd); continue; } else if (!(events[iter].events & EPOLLIN)) { //Data not really ready? cout << "Not ready on " << events[iter].data.fd << endl; continue; } else if (events[iter].data.fd == socketID) { //Event on the listening socket, incoming connections: cout << "Connecting on " << events[iter].data.fd << endl; //Set up accepting socket descriptor: int acceptID = accept(socketID, NULL, NULL); if (acceptID == -1) { //Error: if (!(errno == EAGAIN || errno == EWOULDBLOCK)) { //NOT just letting us know there nothing new: syslog(LOG_ERR, "Can't accept on socket: %s", strerror(errno)); } continue; } //Set non-blocking: if (setNonBlocking(acceptID) < 0) { //Error: syslog(LOG_ERR, "Can't set accepting socket non-blocking: %s", strerror(errno)); close(acceptID); continue; } cout << "Listening on " << acceptID << endl; //Add event listener: event.data.fd = acceptID; event.events = EVENTMODE; if (epoll_ctl(pollID, EPOLL_CTL_ADD, acceptID, &event) < 0) { //Error adding event: syslog(LOG_ERR, "Can't edit epoll: %s", strerror(errno)); close(acceptID); continue; } } else { //Data on accepting socket waiting to be read: cout << "Receive attempt on " << event.data.fd << endl; cout << "Supposed to be " << currFD << endl; if (receive(event.data.fd) == false) { sendOut(event.data.fd, streamFalse); } } } } 

EDIT: The code has been changed, and deleting the cross-launch will really stop epoll from being blocked on the same client. He still has problems with the fact that clients are not receiving data; debugging is done to see if there is a problem or something else.

EDIT: It seems the same mistake in another suit. He tries to get in the second socket, but later logs reports that he actually gets into EWOULDBLOCK almost every time. Interestingly, the logs report much more activity than justified - more than 150,000 lines, when I expect about 60,000. Removing all the "Will be blocked" lines will reduce it to the number I expect ... and now, the resulting lines will create the same template. Returning to the front, he stops the blocking behavior, which, apparently, does not allow him to rotate on his wheels as fast as he can for no apparent reason. Still does not solve the original problem.

EDIT:. To cover my bases, I decided that I would do more debugging on the sending side, since the hanging client is clearly waiting for a message that it never receives. However, I can confirm that the server sends a response for each request it processes; a hung client request is simply completely lost and therefore never responds to.

I also made sure that my receive cycle is read until it actually gets into EWOULDBLOCK (this is usually not necessary, since the first two bytes of my message header contain the message size), but nothing has changed.

'Nother EDIT: I should probably clarify that this system uses a request / response format, and reception, processing and sending are done in one shot. As you can guess, this requires reading the receive buffer until it becomes empty, which is the main requirement for the edge-triggered mode. If the received message is incomplete (which should never happen), the server basically returns false to the client, which, while technically an error will still allow the client to execute another request.

Debugging confirmed that the client freezes, will send a request and wait for a response, but this request never starts anything in epoll - it completely ignores the first client after connecting the second.

I also deleted the attempt to receive immediately after acceptance; in a hundred thousand attempts, he was not ready once.

Read more EDIT: Good, good - if there is one thing that can push me to an arbitrary task, it questions my abilities. So, here is a function in which everything should go wrong:

 bool receive(int socketID) { short recLen = 0; char buff[BUFFERSIZE]; FixedByteStream received; short fullSize = 0; short diff = 0; short iter = 0; short recSoFar = 0; //Loop through received buffer: while ((recLen = read(socketID, buff, BUFFERSIZE)) > 0) { cout << "Receiving on " << socketID << endl; if (fullSize == 0) { //We don't know the size yet, that the first two bytes: fullSize = ntohs(*(uint16_t*)&buff[0]); if (fullSize < 4 || recLen < 4) { //Something went wrong: syslog(LOG_ERR, "Received nothing."); return false; } received = FixedByteStream(fullSize); } diff = fullSize - recSoFar; if (diff > recLen) { //More than received bytes left, get them all: for (iter=0; iter<recLen; ++iter) { received[recSoFar++] = buff[iter]; } } else { //Less than or equal to received bytes left, get only what we need: for (iter=0; iter<diff; ++iter) { received[recSoFar++] = buff[iter]; } } } if (recLen < 0 && errno == EWOULDBLOCK) { cout << "Would block on " << socketID << endl; } if (recLen < 0 && errno != EWOULDBLOCK) { //Had an error: cout << "Error on " << socketID << endl; syslog(LOG_ERR, "Connection receive error: %s", strerror(errno)); return false; } else if (recLen == 0) { //Nothing received at all? cout << "Received nothing on " << socketID << endl; return true; } if (fullSize == 0) { return true; } //Store response, since it needs to be passed as a reference: FixedByteStream response = process(received); //Send response: sendOut(socketID, response); return true; } 

As you can see, it cannot loop after detecting an error. I can't use C ++ a lot, but I coded long enough to check for such errors before asking for help.

 bool sendOut(int socketID, FixedByteStream &output) { cout << "Sending on " << socketID << endl; //Send to socket: if (write(socketID, (char*)output, output.getLength()) < 0) { syslog(LOG_ERR, "Connection send error: %s", strerror(errno)); return false; } return true; } 

What if it is EWOULDBLOCK? As if my motherboard is melting - I will fix it. But this has not happened yet, so I'm not going to fix it, I just made sure that I know if he ever needs to fix it.

And no, process () does nothing with sockets, it takes and returns a fixed-length char array. Again, this program works great with one client, not two or more.

Last EDIT: After even more debugging, I found the source of the problem. I just go and answer myself.

+4
source share
3 answers

event.data.fd ? Why are you trying to use this? events[iter].data.fd is the number with the value you want to receive. You can more clearly name your variables in order to avoid this problem in the future, so as not to waste time on everything. This is clearly not a problem with epoll.

+1
source

1) Do not use EPOLLET. This is much more complicated.

2) In your receive or read function, make sure that you do not call read or receive again after receiving EWOULDBLOCK. Go back to waiting for an epoll hit.

3) Do not try to look into the data or measure the amount of data. Just read it as fast as you can.

4) Remove the socket from the epoll set before closing it if you are not sure that there is no other reference to the base connection endpoint.

It's really that simple. If you do these four things right, you will not have a problem. Most likely you messed up 2 .

Also, how do you deal with EWOULDBLOCK when you set off? What does your sendOut function sendOut ? (There are many right ways to do this, but also many wrong ways.)

+1
source

Revision of my original answer.

I see a few suspicious things, and I have some suggestions.

  • When a socket signal is tapped, the code goes into an infinite loop until a failure is received. I wonder if the loop is prioritized to accept new connections instead of handling epoll events. That is, you always have a connection ready for acceptance, and you never exit the inner while (1) loop. Do not take the loop. Instead, create an NO listening socket that will be triggered when added to epoll. Then just accept one connection at a time, so that subsequent epoll events will be processed after accepting the returns. In other words, take this inner "while (1)" loop.

  • After your accept call returns a valid socket (and you do this by making it non-blocking and adding it to the epoll with edge triggering), continue and call the receive function on the received socket. I assume that your receive function may handle EWOULDBLOCK and EAGAIN errors. In other words, for edge-scrambled sockets, do not assume that you are going to receive an EPOLLIN notification for a new socket. Just try to get on it anyway. If there is no data, you will receive an EPOLLIN notification later when the data arrives.

  • Why aren't you listening to EPOLLOUT regarding your sendOut function? Does sendOut send the socket back to the lock? In any case, when receive () returns success, change your epoll listener to a socket on EPOLLOUT, then try an opportunistic call to the sendOut function as if you had just received an EPOLLOUT notification.

  • And if all else fails, try completely disabling the behavior caused by the edge (EPOLLET). Perhaps your recevie function does not consume all the data from the first EPOLLIN notification.

  • If epoll_ctl fails to add a new socket, it seems a bit harsh to kill the whole application. I just close the abusive socket, affirm and continue.

0
source

Source: https://habr.com/ru/post/1391388/


All Articles