Database Group - 15.2 Planning
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.
Capacity
In %15.2 the Database group will continue to be at 50% capacity.
Boards
Planning
We will keep our focus on the initiatives that affect the most the availability and reliability of GitLab.com and self managed instances. Our non feature development top priority will continue to be on our hiring efforts. A more in depth discussion on this decision can be found on the 15.1 planning section
15.2
Top Priorities for Batched Background Migrations
Who: @dfrazao-gitlab
, @krasio
, @mattkasa
Continue our work to provide a stable, reliable framework for executing the most complex, error prone database operations.
With the basic support for batched background migrations completed, we move our focus to the additional features required to support all types of background migrations and make the framework more stable.
Priority topics for %15.2:
- Batched Background Migrations: Improve batch handling
- Identify migration helpers that use old style migrations
Throttling mechanism for large data changes
Who: @krasio
@stomlinson
We are implementing a generic throttling mechanism for large data changes that will monitor the health of the Database for various signals (leading indicators) and react to problems by throttling or even pausing the execution of the updates.
Priority topics for %15.2:
- Pause migration while autovacuum is running for the table
- Pause migration when WAL queue pending archival crossed a threshold
Stretch goals:
- Pause migration when patroni apdex dropped below SLO
- Throttle migration when WAL rate exceeds threshold
High Severity / Priority issues
DB load balancer: Automatic retries may leak queries across transaction boundaries
Who: @stomlinson
We have found that an idle-in-transaction timeout inside an after_save
callback hook can cause an idle-in-transaction timeout. PostgreSQL will sever the connection, subsequent attempts to use the connection will trigger an error, but the load balancer catches these StatementInvalid
errors and retries them. The danger here is that the retry can cause SQL queries to leak across transaction boundaries.
In a past evaluation, we have shown that the pathology can be generalized to: In a read/write transaction and after at least one write has happened, we retry another write following a connection error.
In %15.1 we have added support for logging the cases when the database load balancer leaks a transaction and we have started analyzing the results that we are gathering.
In %15.2 our plan is to validate our assumption that this happens when writes are mixed inside transaction, investigate why we get errors reported for GET
requests, add a fix and monitor if it solves the problem.
Research and evaluate database migration from Amazon Aurora to RDS
Who: @mattkasa
In %15.1 we have successfully setup an environment in AWS with GitLab and Aurora, we have performed a migration and we are now planning to conclude by running all the necessary tests and proving that this process works without issues.
Our final step will be to document the process and communicate our results with GitLab users.
Update DB group interview exercise
Who: @stomlinson
Hiring is particularly hard right now, in order to increase the speed and efficency of our hiring process, we're going to introduce an updated interview exercise that will replace both the existing rails and database exercises, combining them in a single exercise.
In %15.1 we built out a first example of this exercise with the intent to replace only the database exercise.
In %15.2, we plan on iterating on that template in order to replace the rails exercise as well and try to get the necessary information from it to replace the rails exercise. We also plan on starting to use this exercise this milestone.
Create new database lab node for ci database
Who: @mattkasa
We want to make sure that the new database lab node for the CI database is properly deployed, that the node's replication manages to catch-up and that there are no issues with the setup (e.g. increased lag)