I have two Linux servers (let's call them A and B) connected to the same (unmanaged) switch. I disabled the firewall on both servers (there are no rules in all tables, and ACCEPT for all policies by default). Thus, nothing should prevent one server from sending any TCP / IP packets and another server to receive them as is.
Now, on A, we will launch a TCP server application that listens / accepts incoming connections and then sends a lot of data in a loop for connected clients. It does not try to read from the client and expects to receive an EPIPE error when writing () on socket if / when the client disconnects.
Then, in B, I run nc (netcat) as a client application, connects to the server application on A, starts receiving data, and a few seconds later I press Ctrl-C to abort this connection.
What I see, the server application on A just hangs in write (), it does not have EPIPE or any other error.
I tracked TCP / IP packets using tcpdump, and here is what I see:
- after interrupting netcat to B, B sends FIN to A, which correctly responds with an ACK to that FIN - so now we have a fair half-open TCP connection, which is normal
- Further, A tries to send the following data to the client with the usual ACK and PSH, ACK packets, which are also expected and correct.
- BUT, B does not respond in any way to these packets (while I expect it to respond with an RST packet, as it receives packets to an already closed / non-existent TCP connection)
- A does not receive an ACK, so it stops sending new data and starts forwarding old packets (and at that moment the next call to write () hangs)
I also tried to run netcat on A (so the client and server applications run on the same physical server), and this way everything worked as expected - the server application received EPIPE immediately after interrupting netcat using Ctrl-C. And tcpdump indicates that the RST packet was sent as expected.
So, what could make it impossible to send RST in this case?
I am using Hardened Gentoo Linux, updated, kernel 2.6.39-hardened-r8, without any specific configuration related to the sysctl network.
It is also important to note that there is significant network activity on these servers, about 5000 tcp connections listed in netstat -alnp at any time, and I think that about 1000 connections open and close every second on average. Usually, something like this is usually seen in the kernel log (but the port number is different from that used by the server application described above):
TCP: Possible SYN flooding on port XXXXX. Sending cookies. net_ratelimit: 19 callbacks suppressed
Here's what a TCP session usually looks like: http://i54.tinypic.com/1zz10mx.jpg