Skip to content

Improve graceful disconnection handling for LoadBalancing::Host

Extracted from !130432 (comment 1536438094)

In !130432 (merged) we reduced the amount of time we spend waiting to gracefully disconnect hosts before force disconnecting. The idea of "graceful" disconnection, however, is flawed in 2 ways. It looks like:

        def try_disconnect
          if pool.connections.none?(&:in_use?)
            pool.disconnect!
            return true
          end

The first problem here is a threading race condition where connections can become "in use" before we trigger disconnect!. But the second flaw is that so long as the pool is healthy and the application is receiving steady traffic it's very likely there will never be pool.connections.none?(&:in_use?) because the application will constantly be checking out and in connections from the pool for traffic.

A better way to handle both of the above problems is to mark a pool or LoadBalancing::Host as @disconnecting first and then ensure that our LoadBalancing::LoadBalancer never picks a Host that is @disconnecting. Then we can fix the race condition because we know that once we have no connections in use then we won't get any more. Plus we can fix the main problem that we never actually get to 0 connections in use. This might get rid of the cases where graceful disconnects timeout and could substantially reduce the time it takes to handle disconnections in ServiceDiscovery.