Atomic generation of internal ids for issues. (!17580) · Merge requests · GitLab.org / GitLab FOSS

Andreas Brandl requested to merge 31114-internal-ids-are-not-atomic into master Mar 06, 2018

What does this MR do?

This MR addresses generation of internal ids.

The new scheme has the following properties:

Leverages FOR UPDATE row-level locks while keeping the lock scope as small as possible
No data migration needed (or even not desired)
Per-project sequences are guaranteed to be gapless
Per-project internal ids are guaranteed to be unique (per type: Issue, MR, Pipeline, ...)
Not a lock-free solution, thus concurrency is a concern

The good thing about this is its robustness: Even if a record in internal_ids does not exist yet - we calculate the value on the fly (like we do today for every new Issue) and create the record. Thus, a full data migration is not needed and not desired. This will also help in the future when there's a need to migrate this table.

The lock strategy is based on finding the corresponding record in internal_ids and lock it with FOR UPDATE. As such, this is straight forward.

It becomes more complicated if we tolerate the absence of said record. This is beneficial because we don't need to run a full-blown data migration to support this. OTOH, there's nothing we can synchronize on. Here, we discussed two options:

Lock the record that scopes ID generation, e.g. project with FOR UPDATE
Blind insert into internal_ids and react to duplicate key errors

1 includes a possible deadlock (more). 2 is more favorable and included here.

Other use cases

The generation scheme can be included with:

has_internal_id :iid, scope: :project, init: ->(s) { s.project.issues.maximum(:iid) }

For a list of use-cases we know of today, see https://gitlab.com/gitlab-org/gitlab-ce/issues/31114#note_62039703.

Let's consider a case where we want generate internal ids for Build and scope by Pipeline instead of Project:

Add column pipeline_id integer null to internal_ids (FK to pipelines)
Add unique index on internal_ids (scope, pipeline_id) (maybe exclude NULLs for pipeline_id)
Add value to InternalId.scope enum
Include module in Build model:

has_internal_id :iid, scope: :pipeline, init: ->(s) { s.pipeline.builds.maximum(:iid) }

Are there points in the code the reviewer needs to double check?

Check for race conditions
Is the inline-documentation useful or tldr?

Why was this MR needed?

The existing implementation is not atomic, which leads to concurrency problems where two or more processes are assigned the same internal id while creating e.g. an Issue. Only one of these succeeds while the others hit a unique key constraint violation.

Does this MR meet the acceptance criteria?

Changelog entry added, if necessary
Documentation created/updated
Tests added for this feature/bug
Review
- Has been reviewed by Backend
- Has been reviewed by Database
Conform by the merge request performance guides
Conform by the style guides
Squashed related commits together
End-to-end tests pass (package-qa manual pipeline job)

What are the relevant issue numbers?

Closes #31114 (closed)

Edited Mar 13, 2018 by Andreas Brandl

Atomic generation of internal ids for issues.