Atomic generation of internal ids for issues.
What does this MR do?
This MR addresses generation of internal ids.
The new scheme has the following properties:
- Leverages
FOR UPDATE
row-level locks while keeping the lock scope as small as possible - No data migration needed (or even not desired)
- Per-project sequences are guaranteed to be gapless
- Per-project internal ids are guaranteed to be unique (per type: Issue, MR, Pipeline, ...)
- Not a lock-free solution, thus concurrency is a concern
The good thing about this is its robustness: Even if a record in internal_ids
does not exist yet - we calculate the value on the fly (like we do today for every new Issue) and create the record. Thus, a full data migration is not needed and not desired. This will also help in the future when there's a need to migrate this table.
The lock strategy is based on finding the corresponding record in internal_ids
and lock it with FOR UPDATE
. As such, this is straight forward.
It becomes more complicated if we tolerate the absence of said record. This is beneficial because we don't need to run a full-blown data migration to support this. OTOH, there's nothing we can synchronize on. Here, we discussed two options:
- Lock the record that scopes ID generation, e.g. project with
FOR UPDATE
- Blind insert into
internal_ids
and react to duplicate key errors
1 includes a possible deadlock (more). 2 is more favorable and included here.
Other use cases
The generation scheme can be included with:
has_internal_id :iid, scope: :project, init: ->(s) { s.project.issues.maximum(:iid) }
For a list of use-cases we know of today, see https://gitlab.com/gitlab-org/gitlab-ce/issues/31114#note_62039703.
Let's consider a case where we want generate internal ids for Build
and scope by Pipeline
instead of Project
:
- Add column
pipeline_id integer null
tointernal_ids
(FK topipelines
) - Add unique index
on internal_ids (scope, pipeline_id)
(maybe excludeNULLs
forpipeline_id
) - Add value to
InternalId.scope
enum - Include module in
Build
model:
has_internal_id :iid, scope: :pipeline, init: ->(s) { s.pipeline.builds.maximum(:iid) }
Are there points in the code the reviewer needs to double check?
- Check for race conditions
- Is the inline-documentation useful or tldr?
Why was this MR needed?
The existing implementation is not atomic, which leads to concurrency problems where two or more processes are assigned the same internal id while creating e.g. an Issue. Only one of these succeeds while the others hit a unique key constraint violation.
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary -
Documentation created/updated -
Tests added for this feature/bug - Review
-
Has been reviewed by Backend -
Has been reviewed by Database
-
-
Conform by the merge request performance guides -
Conform by the style guides -
Squashed related commits together -
End-to-end tests pass ( package-qa
manual pipeline job)
What are the relevant issue numbers?
Closes #31114 (closed)