Address Primary Key overflow risk for tables with an integer PK
<!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION -->
*This page may contain information related to upcoming products, features and functionality.
It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes.
Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.*
<!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION -->
We currently have numerous tables in GitLab's database that are defined using an integer (`int4`) Primary Key.
The main reason for that is that in Ruby on Rails versions older than 5.1, by default PKs were defined using 4-byte integers. We have switched to using `bigint` (`8-byte integers`) for primary keys since [late 2018](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/53481), but most of our large tables have been created earlier and are still using `int4` Primary Keys.
The problem is that `int4` can support values up to `2^31 - 1`, meaning that it will overflow once a table gets close to 2.15 Billion records. A more in depth discussion and additional details can be found in gitlab-com/www-gitlab-com!20217
We can quickly check the current tables at risk in the [PostgreSQL /
Postgres Integer Capacity](https://dashboards.gitlab.net/d/qLMTXK1Gk/postgres-integer-capacity?orgId=1&var-environment=gprd) Grafana Dasboard.
This epic organizes our efforts for addressing the aforementioned problem:
1. Monitor the Database and track all the tables with a capacity above a threshold (e.g. 10-20%)
1. Figure out a way to switch the primary key data type from `int4` to `int8` without downtime or significantly affecting the performance of those large, popular tables.
1. Run migrations for each table at risk and switch the PK to `int8`
### Primary Key Statistics Summary
Instead of manually updating the summary we will use the following Grafana link to track PK growth: https://dashboards.gitlab.net/d/qLMTXK1Gk/postgres-integer-capacity?from=now-30d&orgId=1&to=now&var-environment=gprd
#### PK Migrations (Steps)
<details>
<summary>Workflow</summary>
``` mermaid
flowchart TD
subgraph risk[Table - PK Exhaustion Risk]
step_1[Milestone 1 - Create bigint column<br> Create trigger to keep columns in sync] --> step_2
step_2[Milestone 2 - Swap integer and bigint column] --> step_3
step_3[Milestone 3 - Drop trigger<br>Ignore old integer column]
end
subgraph mitigated[Table - PK Exhaustion Mitigated]
step_4[Milestone 4 - Drop old integer column] --> step_5
step_5[Milestone 5 - Remove ignore rule]
end
risk --> mitigated
classDef red fill:#FF6666,stroke:#333,stroke-width:4px;
classDef green fill:#99FF99,stroke:#333,stroke-width:4px;
class risk red
class mitigated green
```
</details>
<details>
<summary>Progress Summary</summary>
```mermaid
flowchart TD
subgraph risk[Table - PK Exhaustion Risk]
step_1[Milestone 1 - Create bigint column<br> Create trigger to keep columns in sync] --> step_2
step_2[Milestone 2 - Swap integer and bigint column] --> step_3
step_3[Milestone 3 - Drop trigger]
end
subgraph mitigated[Table - PK Exhaustion Mitigated]
step_4[Milestone 4 - Drop old integer column<br><br>Scheduled for 14.4<br>ci_build_needs<br>ci_build_trace_chunks<br>ci_builds<br>ci_builds_metadata<br>ci_job_artifacts<br>ci_sources_pipelines<br>ci_stages<br>deployments<br>events<br>geo_job_artifact_deleted_events<br>push_event_payloads<br>taggings] --> step_5
step_5[Milestone 5 - Remove ignore rule<br><br>Scheduled for 14.4<br>ci_builds_runner_session]
end
risk --> mitigated
classDef red fill:#FF6666,stroke:#333,stroke-width:4px;
classDef green fill:#99FF99,stroke:#333,stroke-width:4px;
class risk red
class mitigated green
```
</details>
The table below tracks the detailed progress.
| Table | Prep | Backfill .com | Backfill Release Nr. | Column swap | Column drop | Comment | Next step |
|-----------------------------------|--------------------|--------------------|----------------------|-------------|-------------|--------------------------|------------------------------------------------------------------------------------|
| `web_hook_logs` | :white_check_mark: | :white_check_mark: | 13.10 | gitlab-org/gitlab#323677 | https://gitlab.com/gitlab-org/gitlab/-/issues/323678 | (partitioning migration) | Done :white_check_mark: |
| `ci_job_artifacts` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/325615<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65601 | https://gitlab.com/gitlab-org/gitlab/-/issues/338068<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/68770 | Migration [takes ~13h](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65601#note_630437990), consider creating indexes/FK upfront | |
| `ci_builds` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/325619<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65201 | https://gitlab.com/gitlab-org/gitlab/-/issues/338067 | | |
| `ci_build_needs` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/328046<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65216 | https://gitlab.com/gitlab-org/gitlab/-/issues/338060 | | |
| `ci_build_trace_sections` | :white_check_mark: | Done except 1 failure :boom: | 14.0 | (won't do) | https://gitlab.com/gitlab-org/gitlab/-/issues/328433 | Table will be [dropped in 14.2](https://gitlab.com/gitlab-org/gitlab/-/issues/32565#note_603138100) | |
| `ci_sources_pipelines` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/328436<br/> https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66120 | https://gitlab.com/gitlab-org/gitlab/-/issues/338059<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/69281 | | |
| `ci_build_trace_chunks` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/328443<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66123 | https://gitlab.com/gitlab-org/gitlab/-/issues/338069 | | |
| `ci_builds_runner_session` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/328445<br/> https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66433 | https://gitlab.com/gitlab-org/gitlab/-/issues/338057<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/68542 | | |
| `ci_builds_metadata` | :white_check_mark: | :white_check_mark: | 14.1 | https://gitlab.com/gitlab-org/gitlab/-/issues/325617<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65692 | https://gitlab.com/gitlab-org/gitlab/-/issues/338070 | Migration [takes \~10h](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65692#note_625383681), consider creating indexes/FK upfront | | |
| `ci_stages` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/327623<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66088 | https://gitlab.com/gitlab-org/gitlab/-/issues/338066 | | | |
| `push_event_payloads` | :white_check_mark: | :white_check_mark: | 14.0 | 14.2 https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65795 | https://gitlab.com/gitlab-org/gitlab/-/issues/338071 | | |
| `events` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/288005<br/> https://gitlab.com/gitlab-org/gitlab/-/merge_requests/64779 | https://gitlab.com/gitlab-org/gitlab/-/issues/338072 | | |
| `taggings` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/334823<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66625 | https://gitlab.com/gitlab-org/gitlab/-/issues/338055 | Migration [takes \~3](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66625#note_634238768), consider creating indexes/FK upfront | |
| `deployments` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/334824<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/67341 | https://gitlab.com/gitlab-org/gitlab/-/issues/338054 | | |
| `geo_job_artifact_deleted_events` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/334833<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66763 | https://gitlab.com/gitlab-org/gitlab/-/issues/338050 | | |
epic