I use epoll in what, in my opinion, is a typical way for TCP sockets (mostly on in this example , but slightly adapted to C ++); one main listening socket associated with the port, and each new connection socket (from accept ()) is also added for warnings when it is ready for recv (). I created a test script that basically clogs it with connections and sends / receives. When any one client is connected, it will work flawlessly, endlessly.
However, adding a second concurrent test client will cause one of them to freeze and crash. A couple of days after debugging, I decided that I just spat out the identifier of the socket with which it works, to the file, and I'm puzzled by what I found.
When one script starts, I get only the stream, in this case 6. However, when the second script starts, I get the stream 7. Just 7. And it stays at 7, interacting exclusively with the second client, completely ignoring the first until the first reaches its timeout and does not close. (Then, when client 2 connects again, it gets ID 6 instead.)
It is worth noting that in this test the script does not use a permanent connection, it disconnects and reconnects after several messages back and forth (for more accurate modeling). But even through this, client 1 is ignored. If I set the timeout high enough so that client 2 has time to exit, it still will not resume with client 1, because everything that was expecting just something is lost.
Is this normal behavior because epoll (or sockets in general) completely abandons the previous task when a new one arises? Is there any parameter that I should specify?
EDIT: This is the same code as I can show; I do not necessarily expect that "this is what you did wrong," but rather, "these are some things that will break / fix a similar situation."
#define EVENTMODE (EPOLLIN | EPOLLET | EPOLLRDHUP | EPOLLHUP) #define ERRCHECK (EPOLLERR | EPOLLHUP | EPOLLRDHUP) //Setup event buffer: struct epoll_event* events = (epoll_event*)calloc(maxEventCount, sizeof(event)); //Setup done, main processing loop: int iter, eventCount; while (1) { //Wait for events indefinitely: eventCount = epoll_wait(pollID, events, maxEventCount, -1); if (eventCount < 0) { syslog(LOG_ERR, "Poll checking error, continuing..."); continue; } for (iter = 0; iter<eventCount; ++iter) { int currFD = events[iter].data.fd; cout << "Working with " << events[iter].data.fd << endl; if (events[iter].events & ERRCHECK) { //Error or hangup: cout << "Closing " << events[iter].data.fd << endl; close(events[iter].data.fd); continue; } else if (!(events[iter].events & EPOLLIN)) { //Data not really ready? cout << "Not ready on " << events[iter].data.fd << endl; continue; } else if (events[iter].data.fd == socketID) { //Event on the listening socket, incoming connections: cout << "Connecting on " << events[iter].data.fd << endl; //Set up accepting socket descriptor: int acceptID = accept(socketID, NULL, NULL); if (acceptID == -1) { //Error: if (!(errno == EAGAIN || errno == EWOULDBLOCK)) { //NOT just letting us know there nothing new: syslog(LOG_ERR, "Can't accept on socket: %s", strerror(errno)); } continue; } //Set non-blocking: if (setNonBlocking(acceptID) < 0) { //Error: syslog(LOG_ERR, "Can't set accepting socket non-blocking: %s", strerror(errno)); close(acceptID); continue; } cout << "Listening on " << acceptID << endl; //Add event listener: event.data.fd = acceptID; event.events = EVENTMODE; if (epoll_ctl(pollID, EPOLL_CTL_ADD, acceptID, &event) < 0) { //Error adding event: syslog(LOG_ERR, "Can't edit epoll: %s", strerror(errno)); close(acceptID); continue; } } else { //Data on accepting socket waiting to be read: cout << "Receive attempt on " << event.data.fd << endl; cout << "Supposed to be " << currFD << endl; if (receive(event.data.fd) == false) { sendOut(event.data.fd, streamFalse); } } } }
EDIT: The code has been changed, and deleting the cross-launch will really stop epoll from being blocked on the same client. He still has problems with the fact that clients are not receiving data; debugging is done to see if there is a problem or something else.
EDIT: It seems the same mistake in another suit. He tries to get in the second socket, but later logs reports that he actually gets into EWOULDBLOCK almost every time. Interestingly, the logs report much more activity than justified - more than 150,000 lines, when I expect about 60,000. Removing all the "Will be blocked" lines will reduce it to the number I expect ... and now, the resulting lines will create the same template. Returning to the front, he stops the blocking behavior, which, apparently, does not allow him to rotate on his wheels as fast as he can for no apparent reason. Still does not solve the original problem.
EDIT:. To cover my bases, I decided that I would do more debugging on the sending side, since the hanging client is clearly waiting for a message that it never receives. However, I can confirm that the server sends a response for each request it processes; a hung client request is simply completely lost and therefore never responds to.
I also made sure that my receive cycle is read until it actually gets into EWOULDBLOCK (this is usually not necessary, since the first two bytes of my message header contain the message size), but nothing has changed.
'Nother EDIT: I should probably clarify that this system uses a request / response format, and reception, processing and sending are done in one shot. As you can guess, this requires reading the receive buffer until it becomes empty, which is the main requirement for the edge-triggered mode. If the received message is incomplete (which should never happen), the server basically returns false to the client, which, while technically an error will still allow the client to execute another request.
Debugging confirmed that the client freezes, will send a request and wait for a response, but this request never starts anything in epoll - it completely ignores the first client after connecting the second.
I also deleted the attempt to receive immediately after acceptance; in a hundred thousand attempts, he was not ready once.
Read more EDIT: Good, good - if there is one thing that can push me to an arbitrary task, it questions my abilities. So, here is a function in which everything should go wrong:
bool receive(int socketID) { short recLen = 0; char buff[BUFFERSIZE]; FixedByteStream received; short fullSize = 0; short diff = 0; short iter = 0; short recSoFar = 0;
As you can see, it cannot loop after detecting an error. I can't use C ++ a lot, but I coded long enough to check for such errors before asking for help.
bool sendOut(int socketID, FixedByteStream &output) { cout << "Sending on " << socketID << endl; //Send to socket: if (write(socketID, (char*)output, output.getLength()) < 0) { syslog(LOG_ERR, "Connection send error: %s", strerror(errno)); return false; } return true; }
What if it is EWOULDBLOCK? As if my motherboard is melting - I will fix it. But this has not happened yet, so I'm not going to fix it, I just made sure that I know if he ever needs to fix it.
And no, process () does nothing with sockets, it takes and returns a fixed-length char array. Again, this program works great with one client, not two or more.
Last EDIT: After even more debugging, I found the source of the problem. I just go and answer myself.