Extend subtransactions instrumentation to reveal the number of created subtransactions per transaction

Description

Since a couple of weeks we've been trying to go to be bottom of the bug that is causing database-wide contention coming from SubtransControlLocks locking our database queries.

We've contributed a few merge requests adding additional instrumentation:

  1. Subtransactions counter !66477 (merged)
  2. OverwriteProjectService instrumentation !66372 (merged)
  3. Import/export performance improvement !66792 (merged)
  4. Extend pg_stat_activity sampling to include wait_event gitlab-cookbooks/gitlab-exporters!231 (merged)

It seems that this is not enough and that we need to understand the number of subtransactions per transaction, because the global counter without this context will not allow us to find the root cause.

The current understanding is that this is a database-wide problem not specifically related to the new builds queuing queries, but because we have a robust monitoring around these, it is visible in the Verify area more evidently.

Proposal

  1. Improve the subtransactions instrumentation to include all ActiveRecord queries, not only these coming from ApplicationRecord
  2. Design a method to aggregate the subtransactions per transaction, presumably using SELECT txid_current();.
  3. Surface the new data in logs or Prometheus.

More details about the proposal can be found in the following comment #337843 (comment 645101480)

Screenshots

Queuing queries duration, but this affects entire database, not just CI/CD area:

queuing_queries_degradation_0806

Subtransactions created per model on per second rate: subtrans_models_0806

Active database queries being locked by subtransactions: subtrans_locks_0806

Team Coordination

Due to the frequency we have seen this issue pop up in production incidents over the past week we are pushing to get this new instrumentation into production asap. We are planning to "follow the sun" to get this over the line. It is possible we will need to enlist help over the weekend to complete this instrumentation effort.

Related MR: !67918 (merged)

Role APAC EMEA AMER Notes
Development TBD @grzesiek @stanhu Focused on development of MR !67918 (merged)
Backend Review TBD @dgruzd TBD Tracking development work to gain familiarity and minimize back and forth when ready for review
Database Review @Kras @dgruzd @stomlinson Tracking development work to gain familiarity and minimize back and forth when ready for review
Delivery @hphilipps @hphilipps @rspeicher Release Manager schedule is available at: https://about.gitlab.com/community/release-managers/ - Delivery team members to coordinate deployment of MR !67918 (merged)

/cc @stanhu @smcgivern @brentnewton @igorwwwwwwwwwwwwwwwwwwww

Edited by Chun Du