Fork child hangs if client sends a TCP RST immediately after connecting

We are running a ssh server using libssh, but recently we notice a lot of hanging child processes. After some investigation we have found out how it happens:

  1. Client connects to the server, server accepts the connection and forks a child;
  2. Client sends a TCP RST (quite possibly it's not the client but China's GFW) immediately after connecting;
  3. The child process receives the signal when doing polling at key exchange phase, but the socket is still at state SSH_SOCKET_CONNECTING, so lines 263-279 of file socket.c are invoked, which close the socket and set the file descriptor to -1 without changing the state of the session to SSH_SESSION_STATE_ERROR;
  4. Since the state of the session is still good, the process continues polling with the freed socket and polling context, which makes it hang forever.

Below is a sample strace log:

[pid  2565] clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fed472f8b50) = 26986
Process 26986 attached
[pid 26986] set_robust_list(0x7fed472f8b60, 24 <unfinished ...>
[pid  2565] close(4 <unfinished ...>
[pid 26986] <... set_robust_list resumed> ) = 0
[pid  2565] <... close resumed> )       = 0
[pid 26986] close(3)                    = 0
[pid  2565] accept(3,  <unfinished ...>
[pid 26986] open("/dev/urandom", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 3
[pid 26986] fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 9), ...}) = 0
[pid 26986] poll([{fd=3, events=POLLIN}], 1, 10) = 1 ([{fd=3, revents=POLLIN}])
[pid 26986] read(3, "\251g\26\312\301\22\371\221\3\1D\242=\303\\\260\204\221\253\20\vp8Ex\0\276\335\362r\303\206"..., 48) = 48
[pid 26986] close(3)                    = 0
[pid 26986] getuid()                    = 1000
[pid 26986] poll([{fd=4, events=POLLIN|POLLOUT}], 1, 4294967295) = 1 ([{fd=4, revents=POLLIN|POLLOUT|POLLERR|POLLHUP}])
[pid 26986] getsockopt(4, SOL_SOCKET, SO_ERROR, [104], [4]) = 0
[pid 26986] close(4)                    = 0
[pid 26986] poll([{fd=-1}], 1, 4294967295

The solution I would propose is to change the state of socket to SSH_SOCKET_CONNECTED when the server accepts a new connection.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information