Extend subtransactions instrumentation to reveal the number of created subtransactions per transaction
Description
Since a couple of weeks we've been trying to go to be bottom of the bug that is causing database-wide contention coming from SubtransControlLocks
locking our database queries.
We've contributed a few merge requests adding additional instrumentation:
-
✅ Subtransactions counter➡ !66477 (merged) -
✅ OverwriteProjectService
instrumentation➡ !66372 (merged) -
✅ Import/export performance improvement➡ !66792 (merged) -
✅ Extendpg_stat_activity
sampling to includewait_event
➡ gitlab-cookbooks/gitlab-exporters!231 (merged)
It seems that this is not enough and that we need to understand the number of subtransactions per transaction, because the global counter without this context will not allow us to find the root cause.
The current understanding is that this is a database-wide problem not specifically related to the new builds queuing queries, but because we have a robust monitoring around these, it is visible in the Verify area more evidently.
Proposal
- Improve the subtransactions instrumentation to include all ActiveRecord queries, not only these coming from
ApplicationRecord
- Design a method to aggregate the subtransactions per transaction, presumably using
SELECT txid_current();
. - Surface the new data in logs or Prometheus.
More details about the proposal can be found in the following comment
Screenshots
Queuing queries duration, but this affects entire database, not just CI/CD area:
Subtransactions created per model on per second rate:
Active database queries being locked by subtransactions:
Team Coordination
Due to the frequency we have seen this issue pop up in production incidents over the past week we are pushing to get this new instrumentation into production asap. We are planning to "follow the sun" to get this over the line. It is possible we will need to enlist help over the weekend to complete this instrumentation effort.
Related MR: !67918 (merged)
Role | APAC | EMEA | AMER | Notes |
---|---|---|---|---|
Development | TBD | @grzesiek | @stanhu | Focused on development of MR !67918 (merged) |
Backend Review | TBD | @dgruzd | TBD | Tracking development work to gain familiarity and minimize back and forth when ready for review |
Database Review | @Kras | @dgruzd | @stomlinson | Tracking development work to gain familiarity and minimize back and forth when ready for review |
Delivery | @hphilipps | @hphilipps | @rspeicher | Release Manager schedule is available at: https://about.gitlab.com/community/release-managers/ - Delivery team members to coordinate deployment of MR !67918 (merged) |
/cc @stanhu @smcgivern @brentnewton @igorwwwwwwwwwwwwwwwwwwww