Address Primary Key overflow risk for tables with an integer PK
<!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION --> *This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.* <!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION --> We currently have numerous tables in GitLab's database that are defined using an integer (`int4`) Primary Key. The main reason for that is that in Ruby on Rails versions older than 5.1, by default PKs were defined using 4-byte integers. We have switched to using `bigint` (`8-byte integers`) for primary keys since [late 2018](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/53481), but most of our large tables have been created earlier and are still using `int4` Primary Keys. The problem is that `int4` can support values up to `2^31 - 1`, meaning that it will overflow once a table gets close to 2.15 Billion records. A more in depth discussion and additional details can be found in gitlab-com/www-gitlab-com!20217 We can quickly check the current tables at risk in the [PostgreSQL / Postgres Integer Capacity](https://dashboards.gitlab.net/d/qLMTXK1Gk/postgres-integer-capacity?orgId=1&var-environment=gprd) Grafana Dasboard. This epic organizes our efforts for addressing the aforementioned problem: 1. Monitor the Database and track all the tables with a capacity above a threshold (e.g. 10-20%) 1. Figure out a way to switch the primary key data type from `int4` to `int8` without downtime or significantly affecting the performance of those large, popular tables. 1. Run migrations for each table at risk and switch the PK to `int8` ### Primary Key Statistics Summary Instead of manually updating the summary we will use the following Grafana link to track PK growth: https://dashboards.gitlab.net/d/qLMTXK1Gk/postgres-integer-capacity?from=now-30d&orgId=1&to=now&var-environment=gprd #### PK Migrations (Steps) <details> <summary>Workflow</summary> ``` mermaid flowchart TD subgraph risk[Table - PK Exhaustion Risk] step_1[Milestone 1 - Create bigint column<br> Create trigger to keep columns in sync] --> step_2 step_2[Milestone 2 - Swap integer and bigint column] --> step_3 step_3[Milestone 3 - Drop trigger<br>Ignore old integer column] end subgraph mitigated[Table - PK Exhaustion Mitigated] step_4[Milestone 4 - Drop old integer column] --> step_5 step_5[Milestone 5 - Remove ignore rule] end risk --> mitigated classDef red fill:#FF6666,stroke:#333,stroke-width:4px; classDef green fill:#99FF99,stroke:#333,stroke-width:4px; class risk red class mitigated green ``` </details> <details> <summary>Progress Summary</summary> ```mermaid flowchart TD subgraph risk[Table - PK Exhaustion Risk] step_1[Milestone 1 - Create bigint column<br> Create trigger to keep columns in sync] --> step_2 step_2[Milestone 2 - Swap integer and bigint column] --> step_3 step_3[Milestone 3 - Drop trigger] end subgraph mitigated[Table - PK Exhaustion Mitigated] step_4[Milestone 4 - Drop old integer column<br><br>Scheduled for 14.4<br>ci_build_needs<br>ci_build_trace_chunks<br>ci_builds<br>ci_builds_metadata<br>ci_job_artifacts<br>ci_sources_pipelines<br>ci_stages<br>deployments<br>events<br>geo_job_artifact_deleted_events<br>push_event_payloads<br>taggings] --> step_5 step_5[Milestone 5 - Remove ignore rule<br><br>Scheduled for 14.4<br>ci_builds_runner_session] end risk --> mitigated classDef red fill:#FF6666,stroke:#333,stroke-width:4px; classDef green fill:#99FF99,stroke:#333,stroke-width:4px; class risk red class mitigated green ``` </details> The table below tracks the detailed progress. | Table | Prep | Backfill .com | Backfill Release Nr. | Column swap | Column drop | Comment | Next step | |-----------------------------------|--------------------|--------------------|----------------------|-------------|-------------|--------------------------|------------------------------------------------------------------------------------| | `web_hook_logs` | :white_check_mark: | :white_check_mark: | 13.10 | gitlab-org/gitlab#323677 | https://gitlab.com/gitlab-org/gitlab/-/issues/323678 | (partitioning migration) | Done :white_check_mark: | | `ci_job_artifacts` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/325615<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65601 | https://gitlab.com/gitlab-org/gitlab/-/issues/338068<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/68770 | Migration [takes ~13h](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65601#note_630437990), consider creating indexes/FK upfront | | | `ci_builds` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/325619<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65201 | https://gitlab.com/gitlab-org/gitlab/-/issues/338067 | | | | `ci_build_needs` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/328046<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65216 | https://gitlab.com/gitlab-org/gitlab/-/issues/338060 | | | | `ci_build_trace_sections` | :white_check_mark: | Done except 1 failure :boom: | 14.0 | (won't do) | https://gitlab.com/gitlab-org/gitlab/-/issues/328433 | Table will be [dropped in 14.2](https://gitlab.com/gitlab-org/gitlab/-/issues/32565#note_603138100) | | | `ci_sources_pipelines` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/328436<br/> https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66120 | https://gitlab.com/gitlab-org/gitlab/-/issues/338059<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/69281 | | | | `ci_build_trace_chunks` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/328443<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66123 | https://gitlab.com/gitlab-org/gitlab/-/issues/338069 | | | | `ci_builds_runner_session` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/328445<br/> https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66433 | https://gitlab.com/gitlab-org/gitlab/-/issues/338057<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/68542 | | | | `ci_builds_metadata` | :white_check_mark: | :white_check_mark: | 14.1 | https://gitlab.com/gitlab-org/gitlab/-/issues/325617<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65692 | https://gitlab.com/gitlab-org/gitlab/-/issues/338070 | Migration [takes \~10h](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65692#note_625383681), consider creating indexes/FK upfront | | | | `ci_stages` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/327623<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66088 | https://gitlab.com/gitlab-org/gitlab/-/issues/338066 | | | | | `push_event_payloads` | :white_check_mark: | :white_check_mark: | 14.0 | 14.2 https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65795 | https://gitlab.com/gitlab-org/gitlab/-/issues/338071 | | | | `events` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/288005<br/> https://gitlab.com/gitlab-org/gitlab/-/merge_requests/64779 | https://gitlab.com/gitlab-org/gitlab/-/issues/338072 | | | | `taggings` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/334823<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66625 | https://gitlab.com/gitlab-org/gitlab/-/issues/338055 | Migration [takes \~3](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66625#note_634238768), consider creating indexes/FK upfront | | | `deployments` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/334824<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/67341 | https://gitlab.com/gitlab-org/gitlab/-/issues/338054 | | | | `geo_job_artifact_deleted_events` | :white_check_mark: | :white_check_mark: | 14.0 | https://gitlab.com/gitlab-org/gitlab/-/issues/334833<br/>https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66763 | https://gitlab.com/gitlab-org/gitlab/-/issues/338050 | | |
epic