Reduce tables sizes to < 100 GB per physical table
<!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION -->
*This page may contain information related to upcoming products, features and functionality.
It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes.
Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.*
<!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION -->
Architectural Blueprint: [Database Scalability: Limit on-disk table size to < 100 GB for GitLab.com](https://docs.gitlab.com/ee/architecture/blueprints/database_scaling/size-limits.html)
Large tables on GitLab.com are a major problem - for both operations and development. They cause a variety of problems:
1. **Query timings** and hence overall application performance suffers
1. **Table maintenance** becomes much more costly. Vacuum activity has become a significant concern on GitLab.com - with large tables only seeing infrequent (e.g. once per day) and vacuum runs taking many hours to complete. This has various negative consequences and a very large table has potential to impact seemingly unrelated parts of the database and hence overall application performance suffers.
1. **Data migrations** on large tables are significantly more complex to implement and incur development overhead. They have potential to cause stability problems on GitLab.com and take a long time to execute on large datasets.
1. **Indexes size** is significant. This directly impacts performance as smaller parts of the index are kept in memory and also makes the indexes harder to maintain (think repacking).
1. **Index creation times** go up significantly - in 2021, we see btree creation take up to 6 hours for a single btree index. This impacts our ability to deploy frequently and leads to vacuum-related problems (delayed cleanup).
1. We tend to add **many indexes**
In order to maintain and improve operational stability and lessen development burden, we target a **table size less than 100 GB for a physical table on GitLab.com** (including its indexes).
### Definition of Done
* [ ] Established rule to limit physical table sizes (as they exist on GitLab.com)
* [ ] Identified most impactful work for each problematic table (more than 80 GB) on GitLab.com
* [ ] Prioritize work with backend groups or ~group::database
* [ ] No tables on GitLab.com violate the rule anymore
epic