Race condition: Stale DTLS packets cause IPC desync and auth failure
I have identified a race condition in ocserv where a UDP packet from a previous DTLS session can arrive to a new worker process before it has finished authentication. This causes an IPC message desync, leading to immediate disconnection.
This is particularly prevalent with clients like AnyConnect that perform "Reconnecting (optimizing connection)" shortly after the initial tunnel establishment.
Here is how it happens
- Client reconnects: AnyConnect performs a transparent reconnection to "optimize connection" (likely setting specific options on the tunnel interface). Since it possesses a cookie, no credentials are prompted. It opens a new TCP connection (new worker is spawned by ocserv), establishes a TLS session, and issues an HTTP CONNECT method with a cookie.
- Auth request: The new worker sends an
AUTH_COOKIE_REQmessage to the main process to verify the cookie and waits for anAUTH_COOKIE_REPreply. - Session reuse: After verifying the cookie, the main process finds the existing session for this client and terminates the old worker with SIGTERM in order to steal the IP address from it.
- Socket closure: When the old worker terminates, its client-specific UDP socket is closed. Any "in-flight" UDP packets from the old DTLS session are now picked up by the global listening socket in the main process.
- Stale packet arrival: The main process receives a stale UDP packet and forwards the file descriptor to the new worker via
CMD_UDP_FD. - Auth failure: The new worker is waiting for
AUTH_COOKIE_REP. Instead, it receivesCMD_UDP_FD. Because the message does not match, the worker errors out and terminates the connection.
Log evidence
ocserv[1567117]:main: 111.222.33.44:2443: unexpected DTLS content type: 23; possibly a firewall disassociated a UDP session
ocserv[1567117]:main[user]:111.222.33.44:2477 sending (socket) message 10 to worker (CMD_UDP_FD)
ocserv[1567117]:main[user]:111.222.33.44:2477 passed UDP socket from 111.222.33.44:2443
ocserv[152041]:common/common.c:713: expected 2, received 10 <------- AUTH_COOKIE_REP was expected, but CMD_UDP_FD was received
ocserv[152041]:worker: 111.222.33.44 worker-auth.c:642: error receiving auth reply message
ocserv[152041]:worker: 111.222.33.44 error receiving cookie authentication reply
ocserv[152041]:worker: 111.222.33.44 failed cookie authentication attemptCurrent Impact
AnyConnect clients are disconnected almost immediately after authenticating. Currently, the only workaround is to disable DTLS for them.
Proposed Solution
I have a simple patch that workarounds this by not forwarding non-hello DTLS packets to the new worker. This has worked perfectly in my production environment for a significant time.
However, a more robust fix is likely needed. The situation where a UDP packet from an old session can make it into a new session is almost a disaster.
Potential long-term solutions could include:
- Terminating the DTLS session gracefully with a BYE packet.
- Implementing a mechanism similar to TIME-WAIT in TCP/IP to ensure all remaining packets from the old connection have expired.