Skip to content

Atomic generation of internal ids for issues.

Andreas Brandl requested to merge 31114-internal-ids-are-not-atomic into master

What does this MR do?

This MR addresses generation of internal ids.

The new scheme has the following properties:

  1. Leverages FOR UPDATE row-level locks while keeping the lock scope as small as possible
  2. No data migration needed (or even not desired)
  3. Per-project sequences are guaranteed to be gapless
  4. Per-project internal ids are guaranteed to be unique (per type: Issue, MR, Pipeline, ...)
  5. Not a lock-free solution, thus concurrency is a concern

The good thing about this is its robustness: Even if a record in internal_ids does not exist yet - we calculate the value on the fly (like we do today for every new Issue) and create the record. Thus, a full data migration is not needed and not desired. This will also help in the future when there's a need to migrate this table.

The lock strategy is based on finding the corresponding record in internal_ids and lock it with FOR UPDATE. As such, this is straight forward.

It becomes more complicated if we tolerate the absence of said record. This is beneficial because we don't need to run a full-blown data migration to support this. OTOH, there's nothing we can synchronize on. Here, we discussed two options:

  1. Lock the record that scopes ID generation, e.g. project with FOR UPDATE
  2. Blind insert into internal_ids and react to duplicate key errors

1 includes a possible deadlock (more). 2 is more favorable and included here.

Other use cases

The generation scheme can be included with:

has_internal_id :iid, scope: :project, init: ->(s) { s.project.issues.maximum(:iid) }

For a list of use-cases we know of today, see https://gitlab.com/gitlab-org/gitlab-ce/issues/31114#note_62039703.

Let's consider a case where we want generate internal ids for Build and scope by Pipeline instead of Project:

  1. Add column pipeline_id integer null to internal_ids (FK to pipelines)
  2. Add unique index on internal_ids (scope, pipeline_id) (maybe exclude NULLs for pipeline_id)
  3. Add value to InternalId.scope enum
  4. Include module in Build model:
has_internal_id :iid, scope: :pipeline, init: ->(s) { s.pipeline.builds.maximum(:iid) }

Are there points in the code the reviewer needs to double check?

  1. Check for race conditions
  2. Is the inline-documentation useful or tldr?

Why was this MR needed?

The existing implementation is not atomic, which leads to concurrency problems where two or more processes are assigned the same internal id while creating e.g. an Issue. Only one of these succeeds while the others hit a unique key constraint violation.

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Closes #31114 (closed)

Edited by Andreas Brandl

Merge request reports