When two_step_rebase is enabled, rebases that take longer than idle_in_transaction_session_timeout will fail
Summary
Highly diverged branches on large repositories can take several minutes to rebase. However, when using two_step_rebase the rebase must finish before the idle_in_transaction_session_timeout limit (60 seconds by default in Omnibus) or Postgres will cancel the transaction. As a result, rebases on very large projects can consistently fail.
Steps to reproduce
- Clone a large project such as the Linux kernel
- Set
Merge methodtoMerge commit with semi-linear historyorFast-forward mergeunderSettings -> General -> Merge requests -> Merge method - Checkout an older tag such as
v2.6.28, old enough that rebasing will take greater than 60 seconds - Create a new branch and make a change
- Make a change on master
- Create an MR using the new branch
- Click rebase
- Eventually a
Fast-forward merge is not possiblemessage will be shown in the UI, in the background theRebaseWorkerjob has failed
What is the current bug behavior?
Large rebases fail to complete
What is the expected correct behavior?
Rebases of any size can run to completion
Relevant logs and/or screenshots
{
"severity": "WARN",
"time": "2019-10-10T20:23:09.424Z",
"class": "RebaseWorker",
"retry": 3,
"queue": "rebase",
"jid": "21ff7936418b9e101c448fb1",
"created_at": "2019-10-10T20:21:39.114Z",
"correlation_id": "lGclP00Hs26",
"enqueued_at": "2019-10-10T20:21:39.115Z",
"pid": 13368,
"message": "RebaseWorker JID-21ff7936418b9e101c448fb1: fail: 90.306 sec",
"job_status": "fail",
"scheduling_latency_s": 0.002,
"duration": 90.306,
"system_s": 0.979,
"user_s": 4.074,
"child_s": 0.077,
"completed_at": "2019-10-10T20:23:09.424Z",
"error_message": "Sidekiq::JobRetry::Skip",
"error": {},
"error_backtrace": [
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/job_retry.rb:127:in `rescue in local'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/job_retry.rb:108:in `local'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:136:in `block (5 levels) in dispatch'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/rails.rb:43:in `block in call'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/activesupport-5.2.3/lib/active_support/execution_wrapper.rb:87:in `wrap'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/activesupport-5.2.3/lib/active_support/reloader.rb:73:in `block in wrap'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/activesupport-5.2.3/lib/active_support/execution_wrapper.rb:87:in `wrap'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/activesupport-5.2.3/lib/active_support/reloader.rb:72:in `wrap'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/rails.rb:42:in `call'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:132:in `block (4 levels) in dispatch'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:250:in `stats'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:127:in `block (3 levels) in dispatch'",
"/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/sidekiq_logging/structured_logger.rb:16:in `call'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:126:in `block (2 levels) in dispatch'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/job_retry.rb:74:in `global'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:125:in `block in dispatch'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/logging.rb:48:in `with_context'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/logging.rb:42:in `with_job_hash_context'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:124:in `dispatch'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:163:in `process'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:83:in `process_one'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:71:in `run'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/util.rb:16:in `watchdog'",
"/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/util.rb:25:in `block in safe_thread'"
]
}
{
"severity": "WARN",
"time": "2019-10-10T20:23:09.425Z",
"message": "{\"context\":\"Job raised exception\",\"job\":{\"class\":\"RebaseWorker\",\"args\":[8,1],\"retry\":3,\"queue\":\"rebase\",\"jid\":\"21ff7936418b9e101c448fb1\",\"created_at\":1570738899.1141646,\"correlation_id\":\"lGclP00Hs26\",\"enqueued_at\":1570738899.1150746},\"jobstr\":\"{\\\"class\\\":\\\"RebaseWorker\\\",\\\"args\\\":[8,1],\\\"retry\\\":3,\\\"queue\\\":\\\"rebase\\\",\\\"jid\\\":\\\"21ff7936418b9e101c448fb1\\\",\\\"created_at\\\":1570738899.1141646,\\\"correlation_id\\\":\\\"lGclP00Hs26\\\",\\\"enqueued_at\\\":1570738899.1150746}\"}"
}
{
"severity": "INFO",
"time": "2019-10-10T20:23:25.043Z",
"class": "RebaseWorker",
"retry": 3,
"queue": "rebase",
"jid": "21ff7936418b9e101c448fb1",
"created_at": "2019-10-10T20:21:39.114Z",
"correlation_id": "lGclP00Hs26",
"enqueued_at": "2019-10-10T20:23:25.038Z",
"error_message": "PG::UnableToSend: no connection to the server\n: UPDATE \"merge_requests\" SET \"rebase_jid\" = NULL WHERE \"merge_requests\".\"id\" = 8",
"error_class": "ActiveRecord::StatementInvalid",
"failed_at": 1570738989.4215744,
"retry_count": 0,
"pid": 13368,
"message": "RebaseWorker JID-21ff7936418b9e101c448fb1: start",
"job_status": "start",
"scheduling_latency_s": 0.005
}
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
System information System: Ubuntu 18.04 Proxy: no Current User: git Using RVM: no Ruby Version: 2.6.3p62 Gem Version: 2.7.9 Bundler Version:1.17.3 Rake Version: 12.3.2 Redis Version: 3.2.12 Git Version: 2.22.0 Sidekiq Version:5.2.7 Go Version: unknown
GitLab information Version: 12.3.2-ee Revision: 9ae54250d64 Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 10.9 Geo: no Using LDAP: no Using Omniauth: yes Omniauth Providers:
GitLab Shell Version: 10.0.0 Repository storage paths:
- default: /var/opt/gitlab/git-data/repositories GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Git: /opt/gitlab/embedded/bin/git
Results of GitLab application Check
Expand for output related to the GitLab application check
Checking GitLab subtasks ...
Checking GitLab Shell ...
GitLab Shell: ... GitLab Shell version >= 10.0.0 ? ... OK (10.0.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Check GitLab API access: OK Redis available via internal API: OK
gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Gitaly ...
Gitaly: ... default ... OK
Checking Gitaly ... Finished
Checking Sidekiq ...
Sidekiq: ... Running? ... yes Number of Sidekiq processes ... 1
Checking Sidekiq ... Finished
Checking Incoming Email ...
Incoming Email: ... Reply by email is disabled in config/gitlab.yml
Checking Incoming Email ... Finished
Checking LDAP ...
LDAP: ... LDAP is disabled in config/gitlab.yml
Checking LDAP ... Finished
Checking GitLab App ...
Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... skipped (no tmp uploads folder yet) Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... 1/1 ... yes 1/2 ... yes Redis version >= 2.8.0? ... yes Ruby version >= 2.5.3 ? ... yes (2.6.3) Git version >= 2.22.0 ? ... yes (2.22.0) Git user has default SSH configuration? ... yes Active users: ... 2 Is authorized keys file accessible? ... yes Elasticsearch version 5.6 - 6.x? ... skipped (elasticsearch is disabled)
Checking GitLab App ... Finished
Checking GitLab subtasks ... Finished
Possible fixes
The problem is caused by attempting to perform large rebases in the time permitted by a BEGIN block in Postgres, https://gitlab.com/gitlab-org/gitlab/blob/v12.3.2-ee/app%2Fmodels%2Frepository.rb#L1044-1054. A longer timeout for the transaction, ideally configurable by admins, would work around this issue.
The RebaseWorker will start the DB transaction, then execute the UserRebaseConfirmable Gitaly RPC. However, when the rebase takes over 60 seconds PG releases the lock and closes the connection while the RebaseWorker continues to wait for Gitaly. After UserRebaseConfirmable eventually returns the worker attempts to continue making DB queries, but PG returns terminating connection due to idle-in-transaction timeout and the job fails.
In strace we see:
27585 20:21:39.180018 sendto(102<UNIX:[127187->127188]>, "Q\0\0\0\nBEGIN\0", 11, MSG_NOSIGNAL, NULL, 0) = 11 <0.000103>
27585 20:21:39.180209 recvfrom(102<UNIX:[127187->127188]>, "C\0\0\0\nBEGIN\0Z\0\0\0\5T", 16384, 0, NULL, NULL) = 17 <0.000020>
# REBASE RUNS HERE AND RETURNS SHA
27585 20:23:09.407970 sendto(102<UNIX:[127187->127188]>, "P\0\0\1q\0SELECT \"merge_requests\".* FROM \"merge_requests\" WHERE \"merge_requests\".\"target_project_id\" = 2 AND \"merge_requests\".\"source_branch\" = 'short-rebase' AND \"merge_requests\".\"target_branch\" = 'master' AND \"merge_requests\".\"source_project_id\" = 2 AND (\"merge_requests\".\"state\" IN ('opened')) AND \"merge_requests\".\"id\" != 8 ORDER BY \"merge_requests\".\"id\" ASC LIMIT 1\0\0\0B\0\0\0\16\0\0\0\0\0\0\0\1\0\0D\0\0\0\6P\0E\0\0\0\t\0\0\0\0\0S\0\0\0\4", 407, MSG_NOSIGNAL, NULL, 0) = -1 EPIPE (Broken pipe) <0.000050>
27585 20:23:09.408160 recvfrom(102<UNIX:[127187->127188]>, "E\0\0\0zSFATAL\0VFATAL\0C25P03\0Mterminating connection due to idle-in-transaction timeout\0Fpostgres.c\0L3036\0RProcessInterrupts\0\0", 16384, 0, NULL, NULL) = 123 <0.000025>
/cc @oswaldo