When two_step_rebase is enabled, rebases that take longer than idle_in_transaction_session_timeout will fail

Summary

Highly diverged branches on large repositories can take several minutes to rebase. However, when using two_step_rebase the rebase must finish before the idle_in_transaction_session_timeout limit (60 seconds by default in Omnibus) or Postgres will cancel the transaction. As a result, rebases on very large projects can consistently fail.

Steps to reproduce

  1. Clone a large project such as the Linux kernel
  2. Set Merge method to Merge commit with semi-linear history or Fast-forward merge under Settings -> General -> Merge requests -> Merge method
  3. Checkout an older tag such as v2.6.28, old enough that rebasing will take greater than 60 seconds
  4. Create a new branch and make a change
  5. Make a change on master
  6. Create an MR using the new branch
  7. Click rebase
  8. Eventually a Fast-forward merge is not possible message will be shown in the UI, in the background the RebaseWorker job has failed

What is the current bug behavior?

Large rebases fail to complete

What is the expected correct behavior?

Rebases of any size can run to completion

Relevant logs and/or screenshots

{
  "severity": "WARN",
  "time": "2019-10-10T20:23:09.424Z",
  "class": "RebaseWorker",
  "retry": 3,
  "queue": "rebase",
  "jid": "21ff7936418b9e101c448fb1",
  "created_at": "2019-10-10T20:21:39.114Z",
  "correlation_id": "lGclP00Hs26",
  "enqueued_at": "2019-10-10T20:21:39.115Z",
  "pid": 13368,
  "message": "RebaseWorker JID-21ff7936418b9e101c448fb1: fail: 90.306 sec",
  "job_status": "fail",
  "scheduling_latency_s": 0.002,
  "duration": 90.306,
  "system_s": 0.979,
  "user_s": 4.074,
  "child_s": 0.077,
  "completed_at": "2019-10-10T20:23:09.424Z",
  "error_message": "Sidekiq::JobRetry::Skip",
  "error": {},
  "error_backtrace": [
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/job_retry.rb:127:in `rescue in local'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/job_retry.rb:108:in `local'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:136:in `block (5 levels) in dispatch'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/rails.rb:43:in `block in call'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/activesupport-5.2.3/lib/active_support/execution_wrapper.rb:87:in `wrap'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/activesupport-5.2.3/lib/active_support/reloader.rb:73:in `block in wrap'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/activesupport-5.2.3/lib/active_support/execution_wrapper.rb:87:in `wrap'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/activesupport-5.2.3/lib/active_support/reloader.rb:72:in `wrap'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/rails.rb:42:in `call'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:132:in `block (4 levels) in dispatch'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:250:in `stats'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:127:in `block (3 levels) in dispatch'",
    "/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/sidekiq_logging/structured_logger.rb:16:in `call'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:126:in `block (2 levels) in dispatch'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/job_retry.rb:74:in `global'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:125:in `block in dispatch'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/logging.rb:48:in `with_context'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/logging.rb:42:in `with_job_hash_context'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:124:in `dispatch'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:163:in `process'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:83:in `process_one'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:71:in `run'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/util.rb:16:in `watchdog'",
    "/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/util.rb:25:in `block in safe_thread'"
  ]
}
{
  "severity": "WARN",
  "time": "2019-10-10T20:23:09.425Z",
  "message": "{\"context\":\"Job raised exception\",\"job\":{\"class\":\"RebaseWorker\",\"args\":[8,1],\"retry\":3,\"queue\":\"rebase\",\"jid\":\"21ff7936418b9e101c448fb1\",\"created_at\":1570738899.1141646,\"correlation_id\":\"lGclP00Hs26\",\"enqueued_at\":1570738899.1150746},\"jobstr\":\"{\\\"class\\\":\\\"RebaseWorker\\\",\\\"args\\\":[8,1],\\\"retry\\\":3,\\\"queue\\\":\\\"rebase\\\",\\\"jid\\\":\\\"21ff7936418b9e101c448fb1\\\",\\\"created_at\\\":1570738899.1141646,\\\"correlation_id\\\":\\\"lGclP00Hs26\\\",\\\"enqueued_at\\\":1570738899.1150746}\"}"
}
{
  "severity": "INFO",
  "time": "2019-10-10T20:23:25.043Z",
  "class": "RebaseWorker",
  "retry": 3,
  "queue": "rebase",
  "jid": "21ff7936418b9e101c448fb1",
  "created_at": "2019-10-10T20:21:39.114Z",
  "correlation_id": "lGclP00Hs26",
  "enqueued_at": "2019-10-10T20:23:25.038Z",
  "error_message": "PG::UnableToSend: no connection to the server\n: UPDATE \"merge_requests\" SET \"rebase_jid\" = NULL WHERE \"merge_requests\".\"id\" = 8",
  "error_class": "ActiveRecord::StatementInvalid",
  "failed_at": 1570738989.4215744,
  "retry_count": 0,
  "pid": 13368,
  "message": "RebaseWorker JID-21ff7936418b9e101c448fb1: start",
  "job_status": "start",
  "scheduling_latency_s": 0.005
}

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info

System information System: Ubuntu 18.04 Proxy: no Current User: git Using RVM: no Ruby Version: 2.6.3p62 Gem Version: 2.7.9 Bundler Version:1.17.3 Rake Version: 12.3.2 Redis Version: 3.2.12 Git Version: 2.22.0 Sidekiq Version:5.2.7 Go Version: unknown

GitLab information Version: 12.3.2-ee Revision: 9ae54250d64 Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 10.9 Geo: no Using LDAP: no Using Omniauth: yes Omniauth Providers:

GitLab Shell Version: 10.0.0 Repository storage paths:

  • default: /var/opt/gitlab/git-data/repositories GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Git: /opt/gitlab/embedded/bin/git

Results of GitLab application Check

Expand for output related to the GitLab application check

Checking GitLab subtasks ...

Checking GitLab Shell ...

GitLab Shell: ... GitLab Shell version >= 10.0.0 ? ... OK (10.0.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Check GitLab API access: OK Redis available via internal API: OK

gitlab-shell self-check successful

Checking GitLab Shell ... Finished

Checking Gitaly ...

Gitaly: ... default ... OK

Checking Gitaly ... Finished

Checking Sidekiq ...

Sidekiq: ... Running? ... yes Number of Sidekiq processes ... 1

Checking Sidekiq ... Finished

Checking Incoming Email ...

Incoming Email: ... Reply by email is disabled in config/gitlab.yml

Checking Incoming Email ... Finished

Checking LDAP ...

LDAP: ... LDAP is disabled in config/gitlab.yml

Checking LDAP ... Finished

Checking GitLab App ...

Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... skipped (no tmp uploads folder yet) Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... 1/1 ... yes 1/2 ... yes Redis version >= 2.8.0? ... yes Ruby version >= 2.5.3 ? ... yes (2.6.3) Git version >= 2.22.0 ? ... yes (2.22.0) Git user has default SSH configuration? ... yes Active users: ... 2 Is authorized keys file accessible? ... yes Elasticsearch version 5.6 - 6.x? ... skipped (elasticsearch is disabled)

Checking GitLab App ... Finished

Checking GitLab subtasks ... Finished

Possible fixes

The problem is caused by attempting to perform large rebases in the time permitted by a BEGIN block in Postgres, https://gitlab.com/gitlab-org/gitlab/blob/v12.3.2-ee/app%2Fmodels%2Frepository.rb#L1044-1054. A longer timeout for the transaction, ideally configurable by admins, would work around this issue.

The RebaseWorker will start the DB transaction, then execute the UserRebaseConfirmable Gitaly RPC. However, when the rebase takes over 60 seconds PG releases the lock and closes the connection while the RebaseWorker continues to wait for Gitaly. After UserRebaseConfirmable eventually returns the worker attempts to continue making DB queries, but PG returns terminating connection due to idle-in-transaction timeout and the job fails.

In strace we see:

27585 20:21:39.180018 sendto(102<UNIX:[127187->127188]>, "Q\0\0\0\nBEGIN\0", 11, MSG_NOSIGNAL, NULL, 0) = 11 <0.000103>
27585 20:21:39.180209 recvfrom(102<UNIX:[127187->127188]>, "C\0\0\0\nBEGIN\0Z\0\0\0\5T", 16384, 0, NULL, NULL) = 17 <0.000020>
# REBASE RUNS HERE AND RETURNS SHA
27585 20:23:09.407970 sendto(102<UNIX:[127187->127188]>, "P\0\0\1q\0SELECT  \"merge_requests\".* FROM \"merge_requests\" WHERE \"merge_requests\".\"target_project_id\" = 2 AND \"merge_requests\".\"source_branch\" = 'short-rebase' AND \"merge_requests\".\"target_branch\" = 'master' AND \"merge_requests\".\"source_project_id\" = 2 AND (\"merge_requests\".\"state\" IN ('opened')) AND \"merge_requests\".\"id\" != 8 ORDER BY \"merge_requests\".\"id\" ASC LIMIT 1\0\0\0B\0\0\0\16\0\0\0\0\0\0\0\1\0\0D\0\0\0\6P\0E\0\0\0\t\0\0\0\0\0S\0\0\0\4", 407, MSG_NOSIGNAL, NULL, 0) = -1 EPIPE (Broken pipe) <0.000050>
27585 20:23:09.408160 recvfrom(102<UNIX:[127187->127188]>, "E\0\0\0zSFATAL\0VFATAL\0C25P03\0Mterminating connection due to idle-in-transaction timeout\0Fpostgres.c\0L3036\0RProcessInterrupts\0\0", 16384, 0, NULL, NULL) = 123 <0.000025>

/cc @oswaldo

Assignee Loading
Time tracking Loading