Add instrumentation for recording subtransaction depth
A number of recent, performance slowdowns in the PostgreSQL database due (e.g. gitlab-com/gl-infra/production#5333 (closed)) that cause a high rate of statement timeouts. It looks like most queries are waiting on SubtransControlLock
, which indicates something is going on with how PostgreSQL is handling subtransactions.
We know that if the number of nested subtransactions exceeds 64, there is a significant performance penalty (see https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/), but right now we don't know if any migrations or Sidekiq jobs are actually using that many nested subtransactions.
This merge request instruments ActiveRecord
in order to find the source of nested subtransactions. This change reports a log message into application_json.log
when the depth exceeds 32 subtransactions or the overall transaction exceeds 5 minutes.
A sample log message with thresholds removed:
{
"severity": "INFO",
"time": "2021-08-12T21:18:07.628Z",
"correlation_id": "210c961bee1feb2e68a22fec77ef71fe",
"class": "Gitlab::Database::Transaction::Context",
"result": "commit",
"duration_s": 0.009594183415174484,
"depth": 0,
"savepoints_count": 0,
"rollbacks_count": 0,
"releases_count": 0,
"sql": "/*application:sidekiq,correlation_id:210c961bee1feb2e68a22fec77ef71fe,jid:c0449496fd536dccd23889fc,endpoint_id:Geo::VerificationBatchWorker*/ BEGIN"
}
Merge request reports
Activity
assigned to @stanhu
- A deleted user
added databasereview pending label
5 Warnings This merge request includes more than 10 commits. Each commit should meet the following criteria: - Have a well-written commit message.
- Has all tests passing when used on its own (e.g. when using git checkout SHA).
- Can be reverted on its own without also requiring the revert of commit that came before it.
- Is small enough that it can be reviewed in isolation in under 30 minutes or so.
If this merge request contains commits that do not meet this criteria and/or contains intermediate work, please rebase these commits into a smaller number of commits or split this merge request into multiple smaller merge requests.
61e645c7: The commit body should not contain more than 72 characters per line. For more information, take a look at our Commit message guidelines. b4d930ea: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines. 329db813: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines. Please add a merge request type to this merge request. Reviewer roulette
Changes that require review have been detected!
Please refer to the table below for assigning reviewers and maintainers suggested by Danger in the specified category:
Category Reviewer Maintainer backend Marc Shaw ( @marc_shaw
) (UTC+2, 9 hours ahead of@stanhu
)Dmitry Gruzd ( @dgruzd
) (UTC+3, 10 hours ahead of@stanhu
)database Andy Soiron ( @Andysoiron
) (UTC+2, 9 hours ahead of@stanhu
)Mayra Cabrera ( @mayra-cabrera
) (UTC-5, 2 hours ahead of@stanhu
)To spread load more evenly across eligible reviewers, Danger has picked a candidate for each review slot, based on their timezone. Feel free to override these selections if you think someone else would be better-suited or use the GitLab Review Workload Dashboard to find other available reviewers.
To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines. Please consider assigning a reviewer or maintainer who is a domain expert in the area of the merge request.
Once you've decided who will review this merge request, assign them as a reviewer! Danger does not automatically notify them for you.
If needed, you can retry the
danger-review
job that generated this comment.Generated by
Dangermentioned in issue #337843 (closed)
- Resolved by Stan Hu
assigned to @grzesiek
- Resolved by Grzegorz Bizon
- Resolved by Stan Hu
- Resolved by Stan Hu
- Resolved by Stan Hu
added 1 commit
- a72a48e7 - Track transaction savepoints outside of ApplicationRecord
- Resolved by Stan Hu
- Resolved by Grzegorz Bizon
- Resolved by Stan Hu
- Resolved by Stan Hu
- Resolved by Stan Hu
- Resolved by Stan Hu
- Resolved by Grzegorz Bizon
added 1 commit
- bb71d5f0 - Log database subtransaction details into application structured log
- A deleted user
added grouppipeline execution label
- A deleted user
added feature flag label
added 1 commit
- a5f31141 - Instrument subtransactions when a feature flag is enabled
- Resolved by Stan Hu
- Resolved by Stan Hu
- Resolved by Stan Hu
- Resolved by Stan Hu
@stanhu This still needs some unit tests but otherwise it seems to be almost ready. Can you take a look for an initial review? Thanks in advance! /cc @sgoldstein
requested review from @stanhu
- Resolved by Stan Hu
- Resolved by Stan Hu
- Resolved by Stan Hu
I think that what remains in this MR is:
- Unit tests (especially for the patching and
Transaction::Context
) - Manual tests to see that it works as expected outside of the console
- Defining a good threshold for
def meaninful?
method that return true if we should log this transaction context. - Checking if the queries log contains enough information for us to use, or should we extend tracking even more.
/cc @stanhu @sgoldstein
- Unit tests (especially for the patching and
- Resolved by Stan Hu
- Resolved by Stan Hu
- Resolved by Stan Hu
mentioned in issue gitlab-com/gl-infra/production#5333 (closed)
mentioned in issue #338291
added 1 commit
- 282b2854 - Add tests and fix bug in context duration calculation
- Resolved by Stan Hu
- Resolved by Grzegorz Bizon
In my GDK, this merge request works fine if I modify
meaningful?
toreturn true
. With this script on the console:project = Project.first ApplicationRecord.transaction do ::Vulnerabilities::HistoricalStatistic.safe_ensure_unique(retries: 1) do project.vulnerability_historical_statistics.find_or_initialize_by(date: Time.now); nil end end
I see:
{ "severity": "INFO", "time": "2021-08-11T23:07:49.913Z", "correlation_id": null, "class": "Gitlab::Database::Transaction::Context", "result": "commit", "duration_s": 0.016879000002518296, "depth": 2, "savepoints_count": 1, "rollbacks_count": 0, "releases_count": 1, "sql": "BEGIN /*application:console,line:/lib/gitlab/database/schema_cache_with_renamed_table.rb:25:in `columns'*/" }
However, when I applied this change to my Omnibus instance, I got:
{ "severity": "INFO", "time": "2021-08-11T23:10:00.357Z", "correlation_id": null, "class": "Gitlab::Database::Transaction::Context", "result": "commit", "duration_s": null, "depth": 0, "savepoints_count": 0, "rollbacks_count": 0, "releases_count": 0, "sql": "" }
I hope this is just an issue with my test setup, but I have a feeling there's something more here.
added 1 commit
- 51e6b005 - Handle Marginalia comments present in front of SQL query
changed milestone to %14.2
- Resolved by Grzegorz Bizon
- Resolved by Stan Hu
Setting label(s) devopsverify sectionops based on grouppipeline execution.
added devopsverify sectionops labels
- Resolved by Stan Hu
- Resolved by Stan Hu
- Resolved by Dylan Griffith
- Resolved by Dylan Griffith
- Resolved by Grzegorz Bizon
- Resolved by Grzegorz Bizon
- Resolved by Grzegorz Bizon
added 2 commits
added 258 commits
-
33a4819d...bcdf86cb - 245 commits from branch
master
- a86c7eac - Add instrumentation for recording subtransaction depth
- 329db813 - Track transaction savepoints outside of ApplicationRecord
- b4d930ea - Log database subtransaction details into application structured log
- 5fbc008e - Instrument subtransactions when a feature flag is enabled
- 1a3ae094 - Fix database transaction context attributes
- 70fd8598 - Add spec for transaction observer
- 3dd81ccb - Add tests and fix bug in context duration calculation
- f10abf95 - Handle Marginalia comments present in front of SQL query
- 79eacb51 - Rate limit logging to once per second
- 8c01cfeb - Move set_last_log_timestamp to private
- 9d81d71c - Drop extraneous puts statement
- ca5c27f9 - Avoid manipulating payload SQL directly
- c110074e - Add rollout issue URL to feature flag definition
Toggle commit list-
33a4819d...bcdf86cb - 245 commits from branch
- Resolved by Stan Hu
- Resolved by Grzegorz Bizon
- Resolved by Grzegorz Bizon
- Resolved by Grzegorz Bizon
- Resolved by Grzegorz Bizon
- Resolved by Grzegorz Bizon
- Resolved by Grzegorz Bizon
- Resolved by Grzegorz Bizon
- Resolved by Grzegorz Bizon
mentioned in issue gitlab-com/gl-infra/production#5122 (closed)
requested review from @grzesiek
added 1 commit
- d5cc749a - Add an env variable check to disable transactions tracking
- Resolved by Grzegorz Bizon
- Resolved by Grzegorz Bizon
added 4 commits
Toggle commit list- Resolved by Grzegorz Bizon
added 1 commit
- ce1e8e08 - Add an env variable that disables ActiveRecord patches
- Resolved by Mayra Cabrera
As per #337843 (closed) @dgruzd would you be able to take a first pass and review this merge request? Thanks in advance!
requested review from @dgruzd
- Resolved by Grzegorz Bizon
- Resolved by Grzegorz Bizon
- Resolved by Grzegorz Bizon
added databasereviewed label and removed databasereview pending label
requested review from @smcgivern and @mayra-cabrera and removed review request for @dgruzd
- Resolved by Grzegorz Bizon
- Resolved by Stan Hu
- Resolved by Stan Hu
- Resolved by Stan Hu
- Resolved by Stan Hu
- Resolved by Stan Hu
- Resolved by Grzegorz Bizon
- Resolved by Grzegorz Bizon
Thanks @grzesiek and @stanhu. I think in general this looks OK
and is pretty much good enough for first iteration. I would ensure that we have proper coverage for FF check that is on a critical path.Edited by Kamil Trzciński- Resolved by Grzegorz Bizon
- Resolved by Grzegorz Bizon
- Resolved by Stan Hu
- Resolved by Stan Hu
removed review request for @smcgivern
added 1 commit
- 06b83e11 - Make ActiveRecord patching more controlable and explicit
added 1 commit
- dcdcfbc7 - Instrument transactions only if feature flags are available
added 1 commit
- 3935eb3f - Drop check of feature table check in initializer
- Resolved by Stan Hu
- Resolved by Stan Hu
- Resolved by Stan Hu
added databaseapproved label and removed databasereviewed label
removed review request for @mayra-cabrera
mentioned in issue #338353
There is still some Geo-related test failure here that I'm trying to understand. For some reason it looks like the EE-implementation of
Feature#enable
isn't being called anymore.Also on my Omnibus instance, I'm seeing this failure with Geo:
Feature::InvalidFeatureFlagError: The feature flag YAML definition for 'active_record_transactions_tracking' does not exist
https://sentry.gitlab.net/gitlab/gitlab-geo-internal-develop-is/issues/2757899/
Even after restarting Sidekiq, I'm seeing this.
added 1 commit
- e89454f2 - Remove feature flag checks inside SQL instrumentation
added 1 commit
- 61e645c7 - Remove feature flag checks inside SQL instrumentation
- Resolved by Dylan Griffith
requested review from @DylanGriffith and removed review request for @stanhu
- Resolved by Dylan Griffith
- Resolved by Dylan Griffith
added Pick into auto-deploy severity2 labels
added 1 commit
- f33e06ce - Limit context monkey patches to certain environments
- Resolved by Thong Kuah
- Resolved by Thong Kuah
- Resolved by Thong Kuah
- Resolved by Dylan Griffith
- Resolved by Thong Kuah
added 1 commit
- 0855a6d7 - Use try just in case the method does not exist
@tkuah and I have gone through and tested this and checked for any serious issues we could think of and made just minor tweaks.
While there are probably some stylistic things we could keep polishing, in the interest of urgency here, I'm going to go ahead and set MWPS now anyway as we believe it works and there aren't any remaining serious performance issues and we believe we've done the best we can to mitigate risk with the use of the feature flags and env vars.
We do also still believe this feels really quite hacky but also it seems we've done our best to balance urgency and maintainability here.
enabled an automatic merge when the pipeline for ffabc730 succeeds
mentioned in commit 51398378
Successfully picked into
14-2-auto-deploy-2021081221
. This merge request will receive additional notifications as it's deployed.removed Pick into auto-deploy label
mentioned in issue #338306 (closed)
added workflowstaging label
mentioned in issue #338411 (closed)
added workflowcanary label and removed workflowstaging label
mentioned in issue gitlab-com/gl-infra/production#5350 (closed)
Change request to enable this in production after it has been verified in staging: gitlab-com/gl-infra/production#5350 (closed)
added workflowproduction label and removed workflowcanary label
mentioned in issue gitlab-com/gl-infra/on-call-handovers#1933 (closed)
added releasedcandidate label
added releasedpublished label and removed releasedcandidate label
mentioned in merge request !69744 (merged)
mentioned in merge request !79964 (merged)
mentioned in issue #432828 (closed)
mentioned in issue gitlab-org/quality/triage-reports#20613 (closed)
mentioned in issue gitlab-org/quality/triage-reports#20978 (closed)
mentioned in issue gitlab-org/quality/triage-reports#21525 (closed)
mentioned in issue gitlab-org/quality/triage-reports#22029