Reduce lock contention on table `internal_ids`
Goal:
As a stakeholder in the availability of GitLab.com, I want to reduce the occurrence of service outages and slowness caused by database connection saturation, so that our end-users can continue to reliably use our services during our daily peak workload.
To do so, we need to reduce or avoid the row-level lock contention in table internal_ids
. This contention is suspected of being the root cause of the production environment starving for db connections during the peak workload of each weekday, starting around 2019-06-10.
Context:
This task is a follow-up to the production regression on GitLab.com documented here: #895 (closed)
For background on why this lock contention is suspected as the root cause, see the comments starting here: #895 (comment 181780864)
My current mental model of the problem space:
We have a concurrency bottleneck in that the table internal_ids
is being used as a sequence number generator scoped to each individual Project, and when more than one db client asks for the next sequence value for a Project, the sequence generation can only service one such transaction at a time. (Normal postgres sequence objects don't have that concurrency constraint, but this table-based implementation does.) The duration of the transaction is up to the client, and it's unfortunately sometimes measured in seconds, not milliseconds. The pathology we're observing where PgBouncer starves for available connection slots is a side-effect of:
(a) the serial nature of obtaining the next sequence number for a given Project
(b) the arbitrary length of time that the first transaction blocks subsequent transactions that also want the next sequence number
Solving either point A or B would resolve the lock contention and hence also the pathology we're observing where PgBouncer's connection pool becomes overwhelmed with these blocked transactions waiting for our home-made sequence generator.
As a general rule, concurrency optimization usually involves reducing either the duration or scope of a highly contended mutex. Let's discuss some concrete options along those lines.