TCP Client Message Processing

Question

TCP Client Message Processing

I get a stream of bytes and I need to split the messages, e.g.

Message1\nMessage2\nMessage3\nMess

Each message will be appended with the '\ n' character, but when the full message cannot fit in the buffer, it will receive part of the message and another part of it in the next recv call, which may require reallocation of memory to add the message.

Am I doing this right or is there a better way to process messages instead of reallocating the buffer?

+4

c ++ winsock

will Feb 04 '11 at 9:28

source share

4 answers

badgerr · Answer 1 · 2011-02-04T09:31:08+0000

You can add the length of the message to the message and read it first. Then allocate a buffer large enough to receive the contents, and recv until it reads the required number of bytes.

eg.

 int len = 0; if(recv(socket, reinterpret_cast<char*>(&len), sizeof(int), 0) == sizeof(int)) { std::vector<char> buffer; buffer.resize(len); int bytesRead = 0; while(bytesRead < len) { //read as much as we can. note: byteInc may not == len-bytesRead. int byteInc = recv(socket, &buffer[bytesRead], len-bytesRead, 0); if(byteInc != SOCKET_ERROR) { bytesRead += byteInc; } else { //should probably handle this error properly break; } } //buffer now contains the complete message. some_processing_function(buffer); }

Omnifarious · Answer 2 · 2011-02-05T03:20:42+0000

An option with delimiters in length is probably the best. This allows you to be smart about the allocation of buffers on the receiving side and allows you to send messages containing any character you want. It also eliminates the need to carefully examine each character to see if you have reached the end of the message. Unfortunately, it is very easy to implement this badly.

I will provide you with good code that will do it right.

On the receiver side:

 unsigned char lenbuf[4]; // This whole thing with the while loop occurs twice here, should probably // have its own function. { bytesRead = 0; while (bytesRead < 4) { //read as much as we can. note: byteInc may not == len-bytesRead. int byteInc = recv(socket, &lenbuf[bytesRead], 4-bytesRead, 0); if(byteInc != SOCKET_ERROR) { bytesRead += byteInc; } else { //should probably handle this error properly break; } } } // end scope for bytesRead unsigned int len = ((lenbuf[0] & 0xffu) << 24) | ((lenbuf[1] & 0xffu) << 16) | ((lenbuf[2] & 0xffu) << 8) | (lenbuf[3] & 0xffu); ::std::vector<char> buffer; buffer.resize(len); { unsigned int bytesRead = 0; while(bytesRead < len) { //read as much as we can. note: byteInc may not == len-bytesRead. int byteInc = recv(socket, &buffer[bytesRead], len-bytesRead, 0); if(byteInc != SOCKET_ERROR) { bytesRead += byteInc; } else { //should probably handle this error properly break; } } //buffer now contains the complete message. some_processing_function(buffer); }

On the sending side:

 const unsigned char lenbuf[4] = { ((bytesToSend >> 24) & 0xffu), ((bytesToSend >> 16) & 0xffu), ((bytesToSend >> 8) & 0xffu), (bytesToSend & 0xffu) }; // This basic block is repeated twice and should be in a function { unsigned int bytesSent = 0; while (bytesSend < 4) { const int sentNow = send(socket, &lenbuf[bytesSent], 4-bytesSent, 0); if (sentNow != SOCKET_ERROR) { bytesSent += sentNow; } else { // Should handle this error somehow. break; } } } { unsigned int bytesSent = 0; while (bytesSent < bytesToSend) { const unsigned int toSend = bytesToSend - bytesSent; const int sentNow = send(socket, &byteBuf[bytesSent], toSend, 0); if (sentNow != SOCKET_ERROR) { bytesSent += sentNow; } else { // Should handle this error somehow. break; } } }

The main problem that the other code left is that it doesn’t handle things very well if you get only part of the length, and not all. Needless to say, the information will not be shared so that things will be divided in the middle of the length information.

Another problem is that the length is sent in such a way that the processor and compiler are not agnostic. Different types of processors and different C ++ compilers store their integers in different ways. If the compiler / CPU combination used by the sender is different from the compiler / CPU combination used by the receiver, this will cause problems.

Thus, explicitly excluding an integer into platform symbols in a neutral way and re-combining it again is the best way.

Bojan Komazec · Answer 3 · 2011-02-04T09:55:49+0000

In case the incoming message is very long (~ MB or GB), you can use the constant length buffer and an auxiliary data structure in which you will store pieces of MessageN (N = 1,2 ...). Each recv() fills the buffer from the very beginning. Then you need to process its contents - find \n . If you find it, you can retrieve a new message (MessageN); if not - store the contents of the buffer in an auxiliary data structure (probably a vector or a list) and execute recv() again. If you find \n and the list is not empty, it means that the bytes before \n are actually the last element of the MessageN concatenation list elements - and this fragment together, and then an empty list. If you find \n and the list is empty, this means that all bytes from the beginning of the buffer to \n are MessageN. Then you need to save in bytes of the list after \n (until the next found \n or the end of the buffer) as the first part of the message (N + 1).

Errata · Answer 4 · 2011-02-04T10:40:14+0000

If you do not need to receive the entire message in order to start processing it, you can also use a circular buffer ( wiki , boost ).

First of all, size is good when it cannot know it at startup, can I suggest you not use unsigned int, since the reject client can make you allocate a lot of memory (and make a difference to limit the length).

TCP Client Message Processing

More articles: