Evaluate decomposing Secure and Govern related tables to a separate Postgres DB
Related to discussions in https://gitlab.com/groups/gitlab-org/-/epics/11639#phase-2-identify-section-level-opportunities we might want to evaluate moving some of our larger sets of tables from the Main DB where there is loose coupling to other models and a large scaling opportunity.
Existing data
There are already plans to partition tables related to devopsgovern and devopssecure because they are getting too large but the overall dataset itself is a large workload on the whole in our Main DB with vulnerability_occurrences
being in the top 15 largest tables and overall being a standout in terms of loose coupling to core GitLab functionality and therefore a good candidate for being extracted.
There are also quite a few Sec tables that are larger than 50GiB and several larger than 100GiB.
Overall security related features account for around 39% of tuple updates in our main database. This roughly correlates with how much write IO they add to the primary database and therefore we might see a 39% drop in most of our biggest concerning metrics for (CPU and IO) for our main database.
Longer term benefits
- We expect to see a lot of growth in this data as we bring on newer features like continuous vulnerability scanning which might make it an even bigger standout workload
- The teams working on these features are already struggling a lot with getting certain features past DB review because risky changes on such a large amount of data have a high potential to cause incidents that take out all of GitLab.com. A separate DB would give them a lower risk environment where mistakes are only capable of taking out a limited set of features and not all functionality.
- Additional headroom will be possible in their own DB that may accelerate development of other features that require a lot more DB resources and give them more time to also implement things like partitioning.
- As a business we'll be able to attribute the cost of these workloads more clearly because they have their own DB and we can make better business decisions about how much to invest in optimizing our DB design vs. adding features