Decompose GitLab.com's database to improve scalability
## Summary Decomposition is the chosen solution to scale GitLab's database. This approach relies on moving a feature tables into a separate logical database. We chose this approach because it is iterative and can be implemented in a shorter amount of time than sharding. ## Documentation Rails developer documentation on what this all means for our code can be found at https://docs.gitlab.com/ee/development/database/multiple_databases.html ## Problem to solve The current database architecture uses a single database cluster. We know that some features require more computing resources. By moving these tables to a separately owned databases we allow to ensure that features do not impact other features of a GitLab. This makes it much easier to model the data structure of these features, for example implementing time decay or another partitioning pattern. It also provides headroom. <details> <summary>Solution details with design overview. Click to expand</summary> ## Solution We are going to decompose GitLab's database and will focus on CI tables first. The reason for choosing CI tables is that they account for ~36% of the overall DB size and roughly 50% of writes. Decomposing these would effectively allow us to reduce writes on the main database by 50%. ### Design overview ![Decomposition](/uploads/81208224da95f0a7a535bc4c4dd8f85e/Decomposition.png) ## Possible logical databases ### First Iteration * **gitlab_production_ci: all `ci_*` tables** --> Chosen ### Possible Future Iterations * gitlab_production_packages: all `packages_*` tables * gitlab_production_users: all `users*` and associated tables * gitlab_production_groups: all `projects/groups` and hierarchy * gitlab_production_code_review: all `issues/merge_requests` and associated * gitlab_production_archive: all `web_hook_logs` and other related (a data that is expected to be infrequently accessed) ## Testing TBD ## Summary of Impact As we investigate different areas of the database potentially targeted for decomposition we will summarize the metrics below. [Assumptions for calculating %](https://gitlab.com/gitlab-org/gitlab/-/issues/331523#note_581047057) | Element | DB size (GB) | DB size (%) | Reads/s | Reads/s (%) | Writes/s | Writes/s (%) | |----------------|--------------|-------------|-----------|-------------|----------|--------------| | Web hook logs | 2964.1 | 22.39% | 52.5 | 0.00% | 110.0 | 2.82% | | Merge Requests | 2673.7 | 20.20% | 126073.4 | 1.31% | 795.4 | 20.40% | | CI | 4725.0 | 35.69% | 1712843.8 | 17.87% | 1909.2 | 48.98% | | Rest | 2876.3 | 21.73% | 7748488.5 | 80.82% | 1083.6 | 27.80% | [Google Sheet for reference](https://docs.google.com/spreadsheets/d/1U2cdneKwNOs37AiDxpyxKXtB3X3kFhhxr1hXPJ1bDwY/edit#gid=644136024) </details> ## Expected Scalability Increase The expected scalability increase for the first iteration, CI Tables decomposition is 2x (to be confirmed). <!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION --> *This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.* <!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION -->
epic