Improve the efficiency of non-strictly monotonic (non-gapless) IIDs for pipelines

Introduction

In gitlab-com/gl-infra/production#4051 (closed), the primary database on GitLab.com suffered from contention on the internal_ids table.

This lead to downstream saturation in pgbouncer, sidekiq, web, api, and git services.

image

source

Cause

The cause of this contention was slow client transactions locking internal_ids with SELECT FOR UPDATE, which were then blocking other transactions from obtaining an ID.

This is because the blocked transactions cannot progress until the previous IID has been committed to the database, since we rely on strictly monotonic IDs: that is, each ID follows the previous one by exactly 1 and there are never any gaps.

Proposal

  1. For some high contention classes, allow non-strict monotonic IDs: that is, each ID is greater than the previous one, but occasionally there may be gaps, for example when an ID is issued, when the preceding requesting transaction rolls back.
  2. Allow certain classes, Pipeline to use non-strict monotonic sequences
  3. It's unlikely that any user would notice the occasional missing ID for pipelines

This could potentially be done with a new mixin, NonStrictAtomicInternalId (for example).

# For scaling purposes, allow non-strict monotonic sequences 
module Ci
  class Pipeline < ApplicationRecord
    extend Gitlab::Ci::Model

    include NonStrictAtomicInternalId # instead of AtomicInternalId

Implementation

Obtaining the ID would need to be done in a non-nested separate transaction, and therefore through a separate connection to Postgres.

The commit would be issued immediately on this transaction, and passed back to the caller (or even better, an implicit transaction could be used for the single statement).

I'm not completely sure that this approach will work, hence this is a discussion. However, even if it's a little complicated, theres a good chance that this approach will help us reduce the locking on our primary instance and potentially avoid other incidents like the one we saw today.

cc @pbair @iroussos @abrandl @craig-gomes

Edited by Andrew Newdigate