Skip to content

DB multi-request stickiness doesn't stick as expected

Extracted from gitlab-org/gitlab!49294 (comment 543682086)

Our DB Load Balancing layer has a special sticking mechanism spreading across requests. This mechanism ensures the read consistency caused by replicas' replication lag. The flow looks like this:

  • The namespace of the stickiness can be configured. By default, the namespace is by user id.
  • After a request ends, a rack middleware writes the current write location into Redis if the request performed a write ever.
  • In the next request, inside the middleware, that last write location is compared with all the replica's LSN to determine whether all replicas are caught up. If any of them doesn't, all queries inside that session stick to the primary.
  • In APIs, the namespace of the stickiness can be set when the main object inside the controller is found:

https://gitlab.com/gitlab-org/gitlab/blob/ace8bf48879f80da4c0f652b2afd38b7ac739903/ee/lib/ee/api/helpers/runner.rb

        def current_job
          id = params[:id]

          if id
            ::Gitlab::Database::LoadBalancing::RackMiddleware
              .stick_or_unstick(env, :build, id)
          end

          super
        end

        override :current_runner
        def current_runner
          token = params[:token]

          if token
            ::Gitlab::Database::LoadBalancing::RackMiddleware
              .stick_or_unstick(env, :runner, token)
          end

          super
        end

Or https://gitlab.com/gitlab-org/gitlab/blob/ace8bf48879f80da4c0f652b2afd38b7ac739903/ee/lib/ee/api/helpers.rb

      def current_user
        strong_memoize(:current_user) do
          user = super

          if user
            ::Gitlab::Database::LoadBalancing::RackMiddleware
              .stick_or_unstick(env, :user, user.id)
          end

          user
        end
      end

If an endpoint decide to scope to a namespace, it sets the namespace into the request variable's hash:

        def self.stick_or_unstick(env, namespace, id)
          return unless LoadBalancing.enable?

          Sticking.unstick_or_continue_sticking(namespace, id)

          env[STICK_OBJECT] = [namespace, id]
        end

However, as the hash key is fixed, it stores the last namespace it receives. Hence, after a request ends, the last write location is written for only 1 namespace. That leads the following requests not sticky as expected.

Solution

Expand the env[STICK_OBJECT] to an array, and handle the caller accordingly. In case a request has 3 namespaces, when an object is initialize, the corresponding namespace is checked. If any of them has a lagging write location, the request sticks to primary.