Skip to content
Snippets Groups Projects

Monitor database transaction activity for Rails

Merged Andrew Newdigate requested to merge transaction-metrics into master

This change adds metrics for database transactions and Gitaly n+1 blocks.

Requires https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/29903

What does this MR do?

This changes comes out of gitlab-com/gl-infra/production#895 (closed), the production slowdown we've been seeing over the past few weeks (possibly months).

One of the strongest correlations with the slowdown was the number of database connections help open in the state idle in transaction. This state occurs when the client opens a transaction block and then performs multiple actions, some of which may not involve the database system.

The problem with this approach is that it is extremely fragile.

In a high volume system, it is critical that these transaction blocks are cleared as quickly as possible. The longer they are held open, the more impact they will have on scaling. This is for two reasons:

  1. Locks on the database. The impact of this is seen most on the internal_ids table, but this contention can happen elsewhere too
  2. Contention for a postgres connection. Pgbouncer, our connection pooler, is responsible for multiplexing many client connections between a small number of server connections. During a database transaction, the client holds the connection open for an extended period of time, during which time, other clients cannot use that connection, even if the client is holding it open for non-database connection reasons.
    • In the worse case scenario, this could lead to complete deadlock, if the service indirectly ends up calling other services which require a database connection, but cannot obtain one due to lack of connections.

This MR adds some new metrics, which are important for understanding the way GitLab scales.

The hope is that this will allow us to target poorly scaling code. Often, a small rearrangement, moving the non-transaction services calls out of the transactional block, is sufficient to resolve the issue.

In future, it would be ideal if attempting to call a Gitaly service from within a transaction results in an exception being thrown (in a similar manner to the Gitaly n+1 detector).

This approach has been attempted in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/29774 but it turns out that this anti-pattern is far more prevalent in the code than I had initially expected. So, as a first step, I plan to monitor and assess.

The new metrics added are:

  • gitlab_database_transaction_seconds_bucket A histogram measuring how much time each controller/action pair spends in transaction blocks.
  • gitaly_n_plus_one_duration_seconds_bucket This helps us understand which GitalyClient.allow_n_plus_one blocks are really hurting our server performance, and additionally notifies us if these n+1 blocks are running within a database transaction (which can be very detrimental)
  • gitaly_n_plus_one_calls_total This helps us understand which n+1s are actually generating the most traffic so that they can be prioritised.

image

Does this MR meet the acceptance criteria?

Conformity

Performance and testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Andrew Newdigate

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • I'd be interested in getting your take on this as a first step, allow us to assess which controller/actions are the worst offenders, so that we can prioritise their remediation.

    @andrewn Capturing the metrics is a great step, thank you!

  • added 1 commit

    • f42f66c2 - Adds metrics to measure cost of expensive operations

    Compare with previous version

  • Andrew Newdigate changed title from Add metrics for measuring the cost of Gitaly n+1 blocks and database transactions to Add metrics for measuring certain expensive operations

    changed title from Add metrics for measuring the cost of Gitaly n+1 blocks and database transactions to Add metrics for measuring certain expensive operations

  • Andrew Newdigate changed the description

    changed the description

  • Andrew Newdigate resolved all discussions

    resolved all discussions

  • @abrandl I've made your suggested changes. Over to you for first pass on review 😄

  • Andrew Newdigate assigned to @abrandl and unassigned @andrewn

    assigned to @abrandl and unassigned @andrewn

  • added 1 commit

    • cd59a1c8 - Adds metrics to measure cost of expensive operations

    Compare with previous version

  • Andrew Newdigate added 2 commits

    added 2 commits

    • 49eaaf78 - Refactor inside_transaction? to Gitlab::Database
    • 41197c06 - Adds metrics to measure cost of expensive operations

    Compare with previous version

  • Maxim Ivanov
  • Andreas Brandl
  • Thanks @andrewn, LGTM for database, I added minor comments. Please pass it on to a backend maintainer for further review and merge.

  • Andreas Brandl assigned to @andrewn and unassigned @abrandl

    assigned to @andrewn and unassigned @abrandl

  • @godfat over to you for the next round of review.

  • Andrew Newdigate assigned to @godfat and unassigned @andrewn

    assigned to @godfat and unassigned @andrewn

  • Andrew Newdigate added 473 commits

    added 473 commits

    Compare with previous version

  • added 1 commit

    • 2b086241 - Adds metrics to measure database transactions

    Compare with previous version

  • Andrew Newdigate changed title from Add metrics for measuring certain expensive operations to Monitor database transaction activity for Rails

    changed title from Add metrics for measuring certain expensive operations to Monitor database transaction activity for Rails

  • Andrew Newdigate mentioned in merge request !30022 (closed)

    mentioned in merge request !30022 (closed)

  • Andrew Newdigate resolved all discussions

    resolved all discussions

  • Lin Jen-Shin changed milestone to %12.1

    changed milestone to %12.1

  • Lin Jen-Shin
  • Lin Jen-Shin
  • Lin Jen-Shin assigned to @andrewn and unassigned @godfat

    assigned to @andrewn and unassigned @godfat

  • added 1 commit

    • 56ae34e4 - Adds metrics to measure database transactions

    Compare with previous version

  • Andrew Newdigate assigned to @godfat and unassigned @andrewn

    assigned to @godfat and unassigned @andrewn

  • Thanks for the review @godfat. I've made changes as per your comments. Another round of review please!

  • Lin Jen-Shin approved this merge request

    approved this merge request

  • Lin Jen-Shin resolved all discussions

    resolved all discussions

  • @andrewn Thank you! Looks good to me.

  • merged

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading