Skip to content
Snippets Groups Projects
Commit 172c7141 authored by Prabakaran Murugesan's avatar Prabakaran Murugesan 2️⃣ Committed by Steve Xuereb
Browse files

Update Cell Database Sequence ID Docs

parent c9d68014
No related branches found
No related tags found
1 merge request!11094Update Cell Database Sequence ID Docs
......@@ -44,7 +44,6 @@ This section links all different technical proposals that are being evaluated.
- Planned: Indexing Service
- [Mutual authentication between Cell services](mutual_authentication_between_cell_services.md)
- [Feature Flags](./infrastructure/feature_flags.md) - ([Previous iteration](feature_flags.md))
- [Cluster wide unique sequences](unique_sequences.md)
- [Cells: Infrastructure](./infrastructure/_index.md)
- [Organization migration](migration.md)
- [Routable Tokens](routable_tokens.md)
......@@ -194,4 +193,4 @@ The Tenant Scale team sees an opportunity to use GitLab Dedicated as a base for
- [Database group investigation](../../../infrastructure-platforms/data-access/database-framework/doc/root-namespace-sharding/)
- [Shopify Pods architecture](https://shopify.engineering/a-pods-architecture-to-allow-shopify-to-scale)
- [Opstrace architecture](https://gitlab.com/gitlab-org/opstrace/opstrace/-/blob/main/docs/architecture/overview.md)
- [Adding Diagrams to this blueprint](diagrams/index.md)
- [Adding Diagrams to this blueprint](diagrams/_index.md)
......@@ -13,28 +13,15 @@ and different solutions were discussed in <https://gitlab.com/gitlab-org/core-pl
## Decision
All Cells will have bigint IDs on creation. While provisioning, each of them will get a
large range of sequences to use from the [Topology Service](../topology_service.md).
On decommissioning the cell, these ranges will be
returned back to the topology service. If the returned range is large enough for another cell, it could be handed out to
them so that the short-lived cells won't exhaust large parts of the key range.
All cells will have bigint IDs on creation. While provisioning, each of them will get a
range of sequences to use from the [Topology Service](../topology_service.md). This range is used to set
`minval`, `maxval` for all existing and newly created sequence IDs.
We will update the Legacy Cell's sequence to have a `maxval`, it will be a minimum possible range to make sure it
won't collide with any Cells.
## Consequences
The above decision will support till [Cells 1.5](../iterations/cells-1.5.md) but not [Cells 2.0](../iterations/cells-2.0.md).
To support Cells 2.0 (i.e: allow moving organizations from
Cells to the Legacy Cell), we need all integer IDs in the Legacy Cell to be converted to `bigint`. Which is an
ongoing effort as part of [core-platform-section/data-stores/-/issues/111](https://gitlab.com/gitlab-org/core-platform-section/data-stores/-/issues/111)
and it is estimated to take around 12 months.
Topology service uses the logic explained in [here](../topology_service.md#logic-to-compute-the-range) to compute the sequence range.
## Alternatives
In addition to the [earliest proposal](../rejected/impacted_features/database_sequences.md), we evaluated
below solutions before making the final decision.
Below are the different solutions considered for this problem.
- [Solution 1: Global Service to claim sequences](https://gitlab.com/gitlab-org/core-platform-section/data-stores/-/issues/102#note_1853252715)
- [Solution 2: Converting all int IDs to bigint to generate uniq IDs](https://gitlab.com/gitlab-org/core-platform-section/data-stores/-/issues/102#note_1853260434)
......
---
stage: enablement
group: Tenant Scale
title: 'Cells: Database Sequences'
status: rejected
toc_hide: true
---
{{< design-document-header >}}
_This was surpassed by the [Cells: Unique sequences](../../unique_sequences.md) blueprint._
{{% alert %}}
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
{{% /alert %}}
GitLab today ensures that every database row create has a unique ID, allowing to access a merge request, CI Job or Project by a known global ID.
Cells will use many distinct and not connected databases, each of them having a separate ID for most entities.
At a minimum, any ID referenced between a Cell and the shared schema will need to be unique across the cluster to avoid ambiguous references.
Further to required global IDs, it might also be desirable to retain globally unique IDs for all database rows to allow migrating resources between Cells in the future.
## 1. Definition
## 2. Data flow
## 3. Proposal
These are some preliminary ideas how we can retain unique IDs across the system.
### 3.1. UUID
Instead of using incremental sequences, use UUID (128 bit) that is stored in the database.
- This might break existing IDs and requires adding a UUID column for all existing tables.
- This makes all indexes larger as it requires storing 128 bit instead of 32/64 bit in index.
### 3.2. Use Cell index encoded in ID
Because a significant number of tables already use 64 bit ID numbers we could use MSB to encode the Cell ID:
- This might limit the amount of Cells that can be enabled in a system, as we might decide to only allocate 1024 possible Cell numbers.
- This would make it possible to migrate IDs between Cells, because even if an entity from Cell 1 is migrated to Cell 100 this ID would still be unique.
- If resources are migrated the ID itself will not be enough to decode the Cell number and we would need a lookup table.
- This requires updating all IDs to 32 bits.
### 3.3. Allocate sequence ranges from central place
Each Cell might receive its own range of sequences as they are consumed from a centrally managed place.
Once a Cell consumes all IDs assigned for a given table it would be replenished and a next range would be allocated.
Ranges would be tracked to provide a faster lookup table if a random access pattern is required.
- This might make IDs migratable between Cells, because even if an entity from Cell 1 is migrated to Cell 100 this ID would still be unique.
- If resources are migrated the ID itself will not be enough to decode the Cell number and we would need a much more robust lookup table as we could be breaking previously assigned sequence ranges.
- This does not require updating all IDs to 64 bits.
- This adds some performance penalty to all `INSERT` statements in Postgres or at least from Rails as we need to check for the sequence number and potentially wait for our range to be refreshed from the ID server.
- The available range will need to be stored and incremented in a centralized place so that concurrent transactions cannot possibly get the same value.
### 3.4. Define only some tables to require unique IDs
Maybe it is acceptable only for some tables to have a globally unique IDs. It could be Projects, Groups and other top-level entities.
All other tables like `merge_requests` would only offer a Cell-local ID, but when referenced outside it would rather use an IID (an ID that is monotonic in context of a given resource, like a Project).
- This makes the ID 10000 for `merge_requests` be present on all Cells, which might be sometimes confusing regarding the uniqueness of the resource.
- This might make random access by ID (if ever needed) impossible without using a composite key, like: `project_id+merge_request_id`.
- This would require us to implement a transformation/generation of new ID if we need to migrate records to another Cell. This can lead to very difficult migration processes when these IDs are also used as foreign keys for other records being migrated.
- If IDs need to change when moving between Cells this means that any links to records by ID would no longer work even if those links included the `project_id`.
- If we plan to allow these IDs to not be unique and change the unique constraint to be based on a composite key then we'd need to update all foreign key references to be based on the composite key.
## 4. Evaluation
## 4.1. Pros
## 4.2. Cons
......@@ -134,24 +134,27 @@ Topology Service will make sure that the given range is not overlapping with oth
#### Logic to compute the range
```mermaid
graph TD
flowchart TD
A[64 bits] --> |1 bit - MSB| B[Sign]
A -->|6 bits| C[Reserved]
A -->|16 bits| D[CellID]
A -->|41 bits| E[Sequence]
A -->|57 bits| D[Sequence]
D --> E{Legacy Cell?}
E --> |Yes|F[min: 1, max: 10^12 - 1]
E --> |"No (new cells)"| G{'QA' bucket?}
G --> |Yes| H[min: currentMaxId + 1, max: min + 10^9 - 1]
G --> |No| I[min: currentMaxId + 1, max: min + 10^11 - 1]
```
- **Sign**: Always 0 for positive numbers.
- **Reserved**: Currently always `0`, reserved for 2 purposes.
1. To increase the number of cells, if needed.
1. To allow us to switch to a variant of ULID ID allocation in future without interfering with the existing IDs. Since
ULID based ID allocator will have the `timestamp` value in the most significant bits,
ULID based ID allocator will have the `timestamp` value in the most significant bits,
reserving only one bit would have been sufficient but
more bits are reserved to have the sequence bits at minimum.
- **CellID**: A unique auto-incrementing [unique identifier for a Cell](decisions/012_cell_unique_identifier.md) starting with `1`, can support up to 65,535 Cell IDs.
- **Sequence**: The sequence that will be used for each table in the database.
41 bits can support ~2 trillion IDs (2199,023,255,551) per cell (per sequence).
At the time of writing, the largest ID is 11,098,430,930 (primary key of `security_findings` table), so it's 200 times the current largest ID, which is sufficient.
- **Sequence**:
- Legacy cell gets the first trillion IDs. QA cells get 1 billion IDs and other new cells get 100 billion IDs each.
- Assuming all the new cells created are non-QA and excluding the legacy cell, this will support 1,441,141 cells (using 57 bits).
Example `config.toml` of Topology Service:
......@@ -159,28 +162,60 @@ Example `config.toml` of Topology Service:
[[cells]]
id = 1
address = "legacy.gitlab.com"
sequence_range = [0, 2199023255551]
sequence_range = [1, 999999999999] # 1 trillion
buckets = ["paid", "free"]
status = "active"
[[cells]]
id = 2
address = "cell-2-example.gitlab.com"
sequence_range = [2199023255552, 4398046511103]
sequence_range = [1000000000000, 1099999999999] # 100 billion
buckets = ["paid", "free"]
status = "active"
[[cells]]
id = 3
address = "cells-3-test.gitlab.com"
sequence_range = [1100000000000, 1100999999999] # 1 billion
buckets = ["QA"]
status = "active"
[[cells]]
id = 4
address = "cells-4-example.gitlab.com"
sequence_range = [1101000000000, 1200999999999] # 100 billion
buckets = ["free"]
status = "active"
```
Calculation for `id = 1`:
- Status:
- ready: Cell is not yet ready to accept traffic, but we hold a slot.
- online: Cell is accepting traffic and is part of cluster discovery.
- offline: Cell is valid but not accepting traffic and is still part of cluster discovery.
- removed: Cell is removed and will never be active again.
Once the cell gets `removed`, we will update `sequence_range` with the _maxval_ consumed by the cell.
So that if a normal cell gets removed (decommissioned), new QA cells can get IDs from those unused IDs (if it's more than 1 billion).
##### Sequence Saturation
At the time of writing the largest ID in the legacy cell was ~11 billion (PK of `security_findings` table), so
the legacy cell and new non-QA cells will have sufficient IDs to grow within their sequence_range.
QA cells might need more IDs as they are given 1 billion IDs. Cells sequence data are monitored regularly,
and TS can provide an additional 1 billion IDs (from currentMaxId) to the cell, if their consumption is over 99%.
- Sequences per cell: `2^41 -> 2199023255552`
- Sequence `min`: `(CellId - 1) * SequencesPerCell` -> `(1 - 1) * 2199023255552` -> `0`
- Sequence `max`: `(CellId * SequencesPerCell) - 1` -> `(1 * 2199023255552) - 1` -> `2199023255551`
[Issues#517296](https://gitlab.com/gitlab-org/gitlab/-/issues/517296) handles this.
Calculation for `id = 2`:
NOTE:
- Sequences per cell: `2^41 -> 2199023255552`
- Sequence `min`: `(CellId - 1) * SequencesPerCell` -> `(2 - 1) * 2199023255552` -> `2199023255552`
- Sequence `max`: `(CellId * SequencesPerCell) - 1` -> `(2 * 2199023255552) - 1` -> `4398046511103`
- The above decision will support till [Cells 1.5](iterations/cells-1.5.md) but not [Cells 2.0](iterations/cells-2.0.md).
- To support Cells 2.0 (i.e: allow moving organizations from
Cells to the Legacy Cell), we need all integer IDs in the Legacy Cell to be converted to `bigint`.
Which is an ongoing effort as part of [core-platform-section/data-stores/-/issues/111](https://gitlab.com/gitlab-org/core-platform-section/data-stores/-/issues/111)
and it is estimated to take around 12 months.
More details on the decision taken and other solutions evaluated can be found [here](decisions/008_database_sequences.md)
and the reasoning behind choosing the logic to generate sequence ranges can be found [here](https://gitlab.com/gitlab-org/gitlab/-/issues/465809).
More details on the decision taken and other solutions evaluated can be found [here](decisions/008_database_sequences.md).
```proto
// sequence_request.proto
......@@ -711,7 +746,7 @@ Citations:
1. Google (n.d.). Using private service connect with cloudrun services. Google Cloud. Retrieved Nov 11, 2024, from <https://cloud.google.com/vpc/docs/private-service-connect>
1. Google (n.d.). How multi-region with cloud spanner works. Google Cloud. Retrieved Nov 11, 2024,<https://cloud.google.com/blog/topics/developers-practitioners/demystifying-cloud-spanner-multi-region-configurations>
1. [ADR for private service connect](..q/decisions/004_vpc_subnet_design/)
1. [ADR for private service connect](decisions/004_vpc_subnet_design.md)
### Performance
......
---
stage: core platform
group: database
title: 'Cells: Unique sequences'
status: accepted
toc_hide: true
---
GitLab today ensures that every database row create has a unique ID, allowing to access a merge request, CI Job or Project by a known global ID.
Cells will use many distinct and not connected databases, each of them having a separate ID for most entities.
At a minimum, any ID referenced between a Cell and the shared schema will need to be unique across the cluster to avoid ambiguous references.
Further to required global IDs, it might also be desirable to retain globally unique IDs for all database rows to allow moving organizations between Cells.
## 1. Goal
Is to have non-overlapping sequences across the cluster, so that there will not be a problem while moving organizations between cells.
## 2. Decision
Cells will have bigint IDs while provisioning and each cell will reach out to the Topology Service to get
the sequence range, TS will ensure that the sequence ranges are not colliding with other cells.
The range got from the SequenceService will be used to set `maxval` and `minval` for all existing ID sequences and any
newly created IDs.
Logic to compute to the sequence range and the interactions between cells and the topology service can be found [here](topology_service.md#workflow).
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment