Skip to content
Snippets Groups Projects

Add instrumentation for recording subtransaction depth

Merged Stan Hu requested to merge sh-observe-subtransaction-depth into master
All threads resolved!

A number of recent, performance slowdowns in the PostgreSQL database due (e.g. gitlab-com/gl-infra/production#5333 (closed)) that cause a high rate of statement timeouts. It looks like most queries are waiting on SubtransControlLock, which indicates something is going on with how PostgreSQL is handling subtransactions.

We know that if the number of nested subtransactions exceeds 64, there is a significant performance penalty (see https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/), but right now we don't know if any migrations or Sidekiq jobs are actually using that many nested subtransactions.

This merge request instruments ActiveRecord in order to find the source of nested subtransactions. This change reports a log message into application_json.log when the depth exceeds 32 subtransactions or the overall transaction exceeds 5 minutes.

A sample log message with thresholds removed:

{
  "severity": "INFO",
  "time": "2021-08-12T21:18:07.628Z",
  "correlation_id": "210c961bee1feb2e68a22fec77ef71fe",
  "class": "Gitlab::Database::Transaction::Context",
  "result": "commit",
  "duration_s": 0.009594183415174484,
  "depth": 0,
  "savepoints_count": 0,
  "rollbacks_count": 0,
  "releases_count": 0,
  "sql": "/*application:sidekiq,correlation_id:210c961bee1feb2e68a22fec77ef71fe,jid:c0449496fd536dccd23889fc,endpoint_id:Geo::VerificationBatchWorker*/ BEGIN"
}
Edited by Stan Hu

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • assigned to @stanhu

  • Stan Hu marked this merge request as draft

    marked this merge request as draft

  • A deleted user added databasereview pending label
  • A deleted user added backend database labels

    added backend database labels

  • 5 Warnings
    :warning: This merge request includes more than 10 commits. Each commit should meet the following criteria:
    1. Have a well-written commit message.
    2. Has all tests passing when used on its own (e.g. when using git checkout SHA).
    3. Can be reverted on its own without also requiring the revert of commit that came before it.
    4. Is small enough that it can be reviewed in isolation in under 30 minutes or so.

    If this merge request contains commits that do not meet this criteria and/or contains intermediate work, please rebase these commits into a smaller number of commits or split this merge request into multiple smaller merge requests.

    :warning: 61e645c7: The commit body should not contain more than 72 characters per line. For more information, take a look at our Commit message guidelines.
    :warning: b4d930ea: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines.
    :warning: 329db813: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines.
    :warning: Please add a merge request type to this merge request.

    Reviewer roulette

    Changes that require review have been detected!

    Please refer to the table below for assigning reviewers and maintainers suggested by Danger in the specified category:

    Category Reviewer Maintainer
    backend Marc Shaw (@marc_shaw) (UTC+2, 9 hours ahead of @stanhu) Dmitry Gruzd (@dgruzd) (UTC+3, 10 hours ahead of @stanhu)
    database Andy Soiron (@Andysoiron) (UTC+2, 9 hours ahead of @stanhu) Mayra Cabrera (@mayra-cabrera) (UTC-5, 2 hours ahead of @stanhu)

    To spread load more evenly across eligible reviewers, Danger has picked a candidate for each review slot, based on their timezone. Feel free to override these selections if you think someone else would be better-suited or use the GitLab Review Workload Dashboard to find other available reviewers.

    To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines. Please consider assigning a reviewer or maintainer who is a domain expert in the area of the merge request.

    Once you've decided who will review this merge request, assign them as a reviewer! Danger does not automatically notify them for you.

    If needed, you can retry the danger-review job that generated this comment.

    Generated by :no_entry_sign: Danger

  • mentioned in issue #337843 (closed)

  • Stan Hu
  • Grzegorz Bizon
  • Grzegorz Bizon
  • Kamil Trzciński
  • Grzegorz Bizon added 1 commit

    added 1 commit

    • a72a48e7 - Track transaction savepoints outside of ApplicationRecord

    Compare with previous version

  • Grzegorz Bizon
  • Grzegorz Bizon
  • Grzegorz Bizon
  • Grzegorz Bizon
  • Grzegorz Bizon
  • Grzegorz Bizon
  • Grzegorz Bizon added 1 commit

    added 1 commit

    • bb71d5f0 - Log database subtransaction details into application structured log

    Compare with previous version

  • A deleted user added grouppipeline execution label
  • A deleted user added feature flag label

    added feature flag label

  • Grzegorz Bizon added 1 commit

    added 1 commit

    • a5f31141 - Instrument subtransactions when a feature flag is enabled

    Compare with previous version

  • Grzegorz Bizon
  • Grzegorz Bizon
  • Grzegorz Bizon added 1 commit

    added 1 commit

    • 8ad0bc1e - Fix database transaction context attributes

    Compare with previous version

  • Grzegorz Bizon
  • Grzegorz Bizon
  • @stanhu This still needs some unit tests but otherwise it seems to be almost ready. Can you take a look for an initial review? Thanks in advance! /cc @sgoldstein

  • Grzegorz Bizon requested review from @stanhu

    requested review from @stanhu

  • Grzegorz Bizon
  • Grzegorz Bizon
  • Grzegorz Bizon
  • I think that what remains in this MR is:

    1. Unit tests (especially for the patching and Transaction::Context)
    2. Manual tests to see that it works as expected outside of the console
    3. Defining a good threshold for def meaninful? method that return true if we should log this transaction context.
    4. Checking if the queries log contains enough information for us to use, or should we extend tracking even more.

    /cc @stanhu @sgoldstein

  • Stan Hu
  • Stan Hu
  • Kamil Trzciński
  • Stan Hu added 1 commit

    added 1 commit

    • 49f66644 - Add spec for transaction observer

    Compare with previous version

  • Stan Hu mentioned in issue #338291

    mentioned in issue #338291

  • Stan Hu added 1 commit

    added 1 commit

    • 282b2854 - Add tests and fix bug in context duration calculation

    Compare with previous version

  • Stan Hu
    • Author Owner
      Resolved by Grzegorz Bizon

      In my GDK, this merge request works fine if I modify meaningful? to return true. With this script on the console:

      project = Project.first
      
      ApplicationRecord.transaction do 
        ::Vulnerabilities::HistoricalStatistic.safe_ensure_unique(retries: 1) do
           project.vulnerability_historical_statistics.find_or_initialize_by(date: Time.now); nil 
        end
      end

      I see:

      {
        "severity": "INFO",
        "time": "2021-08-11T23:07:49.913Z",
        "correlation_id": null,
        "class": "Gitlab::Database::Transaction::Context",
        "result": "commit",
        "duration_s": 0.016879000002518296,
        "depth": 2,
        "savepoints_count": 1,
        "rollbacks_count": 0,
        "releases_count": 1,
        "sql": "BEGIN /*application:console,line:/lib/gitlab/database/schema_cache_with_renamed_table.rb:25:in `columns'*/"
      }

      However, when I applied this change to my Omnibus instance, I got:

      {
        "severity": "INFO",
        "time": "2021-08-11T23:10:00.357Z",
        "correlation_id": null,
        "class": "Gitlab::Database::Transaction::Context",
        "result": "commit",
        "duration_s": null,
        "depth": 0,
        "savepoints_count": 0,
        "rollbacks_count": 0,
        "releases_count": 0,
        "sql": ""
      }

      I hope this is just an issue with my test setup, but I have a feeling there's something more here.

  • Stan Hu added 1 commit

    added 1 commit

    • 51e6b005 - Handle Marginalia comments present in front of SQL query

    Compare with previous version

  • Stan Hu marked this merge request as ready

    marked this merge request as ready

  • Stan Hu changed the description

    changed the description

  • Stan Hu changed milestone to %14.2

    changed milestone to %14.2

  • Stan Hu
  • Stan Hu
  • Stan Hu added 1 commit

    added 1 commit

    • e951e0a1 - Rate limit logging to once per second

    Compare with previous version

  • Luke Duncalfe
  • Stan Hu added 1 commit

    added 1 commit

    • cf65e3a1 - Move set_last_log_timestamp to private

    Compare with previous version

  • I left some minor comments, but it looks good to me :thumbsup:

  • Stan Hu added 2 commits

    added 2 commits

    • 29d87075 - Drop extraneous puts statement
    • 33a4819d - Avoid manipulating payload SQL directly

    Compare with previous version

  • Stan Hu added 258 commits

    added 258 commits

    • 33a4819d...bcdf86cb - 245 commits from branch master
    • a86c7eac - Add instrumentation for recording subtransaction depth
    • 329db813 - Track transaction savepoints outside of ApplicationRecord
    • b4d930ea - Log database subtransaction details into application structured log
    • 5fbc008e - Instrument subtransactions when a feature flag is enabled
    • 1a3ae094 - Fix database transaction context attributes
    • 70fd8598 - Add spec for transaction observer
    • 3dd81ccb - Add tests and fix bug in context duration calculation
    • f10abf95 - Handle Marginalia comments present in front of SQL query
    • 79eacb51 - Rate limit logging to once per second
    • 8c01cfeb - Move set_last_log_timestamp to private
    • 9d81d71c - Drop extraneous puts statement
    • ca5c27f9 - Avoid manipulating payload SQL directly
    • c110074e - Add rollout issue URL to feature flag definition

    Compare with previous version

  • Stan Hu
  • Stan Hu
  • Stan Hu added 1 commit

    added 1 commit

    • 8094b663 - Fix broken SQL command parsing spec

    Compare with previous version

  • Stan Hu added 1 commit

    added 1 commit

    Compare with previous version

  • Stan Hu
  • Grzegorz Bizon
  • Grzegorz Bizon
  • Grzegorz Bizon
  • Grzegorz Bizon
  • Grzegorz Bizon
  • Grzegorz Bizon requested review from @grzesiek

    requested review from @grzesiek

  • Grzegorz Bizon added 2 commits

    added 2 commits

    • b11f4ede - Do not instrument already instrumented transaction commit
    • 3d0a96f8 - Refactor checking if a transaction context should be logged

    Compare with previous version

  • Grzegorz Bizon added 1 commit

    added 1 commit

    • d5cc749a - Add an env variable check to disable transactions tracking

    Compare with previous version

  • Grzegorz Bizon
  • Grzegorz Bizon added 4 commits

    added 4 commits

    • 8af439d2 - Make it explicit how we match SQL in transaction observer
    • e045b1fd - Add long transactions tracking rollout issue reference
    • 64defc41 - Enable transactions tracking only on GitLab.com and tests
    • 4821cecc - Do not freeze immutable objects in transactions observer

    Compare with previous version

  • Grzegorz Bizon
  • Grzegorz Bizon added 1 commit

    added 1 commit

    • ce1e8e08 - Add an env variable that disables ActiveRecord patches

    Compare with previous version

  • Grzegorz Bizon requested review from @dgruzd

    requested review from @dgruzd

  • Dmitry Gruzd approved this merge request

    approved this merge request

  • Dmitry Gruzd
  • Dmitry Gruzd
  • Dmitry Gruzd
  • added databasereviewed label and removed databasereview pending label

  • Dmitry Gruzd requested review from @smcgivern and @mayra-cabrera and removed review request for @dgruzd

    requested review from @smcgivern and @mayra-cabrera and removed review request for @dgruzd

    • Resolved by Grzegorz Bizon

      :wave: @dgruzd, thanks for approving this merge request.

      Please consider starting a new pipeline if:

      • This is the first time the merge request is approved, or
      • The merge request is ready to be merged, and there has not been a merge request pipeline in the last 2 hours.

      For more info, refer to the guideline.

  • Dmitry Gruzd
  • Grzegorz Bizon added 1 commit

    added 1 commit

    • 803437ac - Apply 2 suggestion(s) to 2 file(s)

    Compare with previous version

  • Kamil Trzciński
  • Kamil Trzciński
  • Kamil Trzciński
  • Kamil Trzciński
  • Kamil Trzciński
  • Grzegorz Bizon added 2 commits

    added 2 commits

    • 49c7e094 - Add feature flag rollout issue and minor change in specs
    • 9bd3ba55 - Make ActiveRecord patch less instrusive by using a prefix

    Compare with previous version

  • Thanks @grzesiek and @stanhu. I think in general this looks OK :thumbsup: and is pretty much good enough for first iteration. I would ensure that we have proper coverage for FF check that is on a critical path.

    Edited by Kamil Trzciński
  • Sean McGivern
  • Sean McGivern
  • Sean McGivern removed review request for @smcgivern

    removed review request for @smcgivern

  • Grzegorz Bizon added 1 commit

    added 1 commit

    • 06b83e11 - Make ActiveRecord patching more controlable and explicit

    Compare with previous version

  • Grzegorz Bizon added 1 commit

    added 1 commit

    • dcdcfbc7 - Instrument transactions only if feature flags are available

    Compare with previous version

  • Stan Hu added 1 commit

    added 1 commit

    • 3935eb3f - Drop check of feature table check in initializer

    Compare with previous version

  • Mayra Cabrera
  • Mayra Cabrera
  • Mayra Cabrera approved this merge request

    approved this merge request

  • added databaseapproved label and removed databasereviewed label

  • Mayra Cabrera removed review request for @mayra-cabrera

    removed review request for @mayra-cabrera

  • Stan Hu added 1 commit

    added 1 commit

    • 18222568 - Fix typo in log throttle example

    Compare with previous version

  • Stan Hu mentioned in issue #338353

    mentioned in issue #338353

  • Stan Hu resolved all threads

    resolved all threads

  • Author Owner

    There is still some Geo-related test failure here that I'm trying to understand. For some reason it looks like the EE-implementation of Feature#enable isn't being called anymore.

    Also on my Omnibus instance, I'm seeing this failure with Geo:

    Feature::InvalidFeatureFlagError: The feature flag YAML definition for 'active_record_transactions_tracking' does not exist

    https://sentry.gitlab.net/gitlab/gitlab-geo-internal-develop-is/issues/2757899/

    Even after restarting Sidekiq, I'm seeing this.

  • Author Owner

    It seems calling ::Feature.enabled?(:active_record_transactions_tracking, type: :ops, default_enabled: :yaml) inside instrument_transactions is causing issues. It seems risky to be making any SQL calls inside the instrumentation code for SQL.

  • Stan Hu added 1 commit

    added 1 commit

    • e89454f2 - Remove feature flag checks inside SQL instrumentation

    Compare with previous version

  • Stan Hu resolved all threads

    resolved all threads

  • Stan Hu added 1 commit

    added 1 commit

    • 61e645c7 - Remove feature flag checks inside SQL instrumentation

    Compare with previous version

  • Stan Hu
  • Stan Hu requested review from @DylanGriffith and removed review request for @stanhu

    requested review from @DylanGriffith and removed review request for @stanhu

  • Stan Hu changed the description

    changed the description

  • Thong Kuah
  • Chad Woolley
  • Stan Hu added 1 commit

    added 1 commit

    • f33e06ce - Limit context monkey patches to certain environments

    Compare with previous version

  • Thong Kuah
  • Thong Kuah
  • Thong Kuah
  • Thong Kuah
  • Thong Kuah
  • Thong Kuah added 1 commit

    added 1 commit

    • 0855a6d7 - Use try just in case the method does not exist

    Compare with previous version

  • Dylan Griffith resolved all threads

    resolved all threads

  • Dylan Griffith approved this merge request

    approved this merge request

  • @tkuah and I have gone through and tested this and checked for any serious issues we could think of and made just minor tweaks.

    While there are probably some stylistic things we could keep polishing, in the interest of urgency here, I'm going to go ahead and set MWPS now anyway as we believe it works and there aren't any remaining serious performance issues and we believe we've done the best we can to mitigate risk with the use of the feature flags and env vars.

    We do also still believe this feels really quite hacky but also it seems we've done our best to balance urgency and maintainability here.

  • Dylan Griffith enabled an automatic merge when the pipeline for ffabc730 succeeds

    enabled an automatic merge when the pipeline for ffabc730 succeeds

  • Dylan Griffith mentioned in commit 51398378

    mentioned in commit 51398378

  • Successfully picked into 14-2-auto-deploy-2021081221. This merge request will receive additional notifications as it's deployed.

  • mentioned in issue #338306 (closed)

  • mentioned in issue #338411 (closed)

  • added workflowcanary label and removed workflowstaging label

  • Change request to enable this in production after it has been verified in staging: gitlab-com/gl-infra/production#5350 (closed)

  • added workflowproduction label and removed workflowcanary label

  • Heinrich Lee Yu mentioned in merge request !69744 (merged)

    mentioned in merge request !69744 (merged)

  • Alex Pooley mentioned in merge request !79964 (merged)

    mentioned in merge request !79964 (merged)

  • mentioned in issue #432828 (closed)

  • Please register or sign in to reply
    Loading