Database Team - 17.1 Planning
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.
Capacity
For a list of upcoming absences, please refer to our weekly status update. Please also keep in mind that about half of the teams capacity is typically consumed by unplanned work.
Alex will be on Parental Leave starting in May; his responsibilities will be divided according to https://gitlab.com/gitlab-org/core-platform-section/discussions/-/issues/150
Boards
Planning
We are maintaining focus on the initiatives that affect the most the availability and reliability of GitLab.com and self managed instances:
- A summary of key saturation thresholds and mitigation strategies are summarized here (confidential issue).
Stable Counterpart Support
- We have one FTE engineer assigned as a stable counterpart to support datastore solutions for AI-related initiatives.
- We have two FTE engineers assigned to support Tenant Cell related efforts
17.1
Priorities for Solutions to mitigate Lightweight Lock contention
Related OKR: https://gitlab.com/gitlab-com/gitlab-OKRs/-/work_items/7359+
What: We've previously done short term mitigations, but will continue to invest in long term solutions and monitoring. Our main goals are: Developing consistent and reliable monitoring for the situation, identify queries that contribute highly to Lightweight lock saturation, and experiment with mitigating their impact.
Why: In order to allow for the long term scalability of GitLab.com, we want to reduce our lightweight lock saturation to a manageable level.
DRI: @stomlinson
WAL Rate Reduction
Related OKR: https://gitlab.com/gitlab-com/gitlab-OKRs/-/work_items/7228+
What: We're using existing metrics to identify queries that generate excess WAL and working with those teams to re-write queries and features to be more WAL performant.
Why: In order to allow for the long term scalability of GitLab.com, we want to reduce our WAL generation to a manageable level.
DRI: @krasio
Cells: Cluster-wide unique identifiers
Related OKR: https://gitlab.com/gitlab-com/gitlab-OKRs/-/work_items/7229+
What: The Cells team has requested our assistance in establishing cluster-wide unique identifiers.
Why: The cells project is very important for our long term stability and our team will support it as much as possible.
Update: After the cells fast-boot, the team has a clearer picture of the direction we need to take on this. Up next is finalizing our plans for generating sequences, and finishing the conversion of new servers to use big integers as primary keys.
DRI: @praba.m7n
Table size reduction effort
DRI: @mattkasa
While Reducing lightweight lock contention is the team's primary concern (and therefore it's primary focus), we're still working with devopscreate and devopsverify to partition some of the largest tables in our database. @stomlinson
is stepping back on these topics to focus on the WAL rate investigation.
Related items:
-
CI Partitioning Support
- Update: Work is ongoing to partition new tables and migrate the ci_pipelines primary key to bigint.
-
Partition merge_request_diff_*
- Update: @stomlinson has been working on adding composite primary key support to the background migration framework in order to support migrating these tables.
Migrations run in order of milestone, then type, then version (Secondary Objective)
DRI: @jon_jenkins
What: We aim to enrich database migrations and tag them with the milestone they belong to. This will let us improve how we order migrations when execute them, and not rely on migration's timestamp (version) as it is a bit random and can be misleading.
Why: This will improve upgrade experience for self-managed customers that jump multiple milestones at a time - in case they hit an error when executing migrations, we will be able to tell with which GitLab release their schema is compatible with.
Update: Code and testing is complete, this code is now going through final review.