We have a network application, it will be used inside different scripts to communicate with other systems.
Sometimes scripts freeze when calling our network application. We recently ran into a hang, and I tried to debug the proven process of this particular application.
This application consists of a client and a server (daemon), hangs on the client side.
Strace's output showed me that it hangs when choosing a system call.
> strace -p 34567
select(4, [3], NULL, NULL, NULL
As you can see, there is no timeout specified in the call selection; it can be blocked indefinitely if the file descriptor "3" is not ready for reading.
Resultlsof showed that fd '3' is in FIN_WAIT2 state.
> lsof -p 34567
client 34567 user 3u IPv4 55184032 TCP client-box:smar-se-port2->server:daemon (FIN_WAIT2)
-? FIN_WAIT2? ( ), -. , , fd '3' , select() -!
, select(), , .
?
- SuSE Linux.