Investigate recent migration failures when using `sidekiq_remove_jobs`
Status
We think this migration failures are caused by the 30 second idle in transaction timeout.
Rails migrations have a DDL transcation open by default. On production we may have a large number of Sidekiq jobs to be removed, which causes the migration DDL transaction to be open longer than 30s. PostgreSQL then closes the connection.
Workaround
Add disable_ddl_transaction!
to the migration that uses sidekiq_remove_jobs
Description
Two Three production incidents have been caused by the usage of remove_sidekiq_jobs
on migrations:
- gitlab-com/gl-infra/production#8612 (closed) caused by !113899 (merged)
- gitlab-com/gl-infra/production#8612 (closed) caused by !118049 (merged)
Migrations failed with:
stderr: |-
rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:
PG::ConnectionBad: PQsocket() can't get socket descriptor
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:408:in `rollback'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `public_send'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `block in write_using_load_balancer'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:127:in `block in read_write'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:198:in `retry_with_backoff'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:116:in `read_write'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:126:in `write_using_load_balancer'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:78:in `transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:369:in `block in transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:368:in `transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/lock_retry_mixin.rb:36:in `ddl_transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:15:in `block in with_advisory_lock_connection'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:12:in `with_advisory_lock_connection'
/opt/gitlab/embedded/bin/bundle:23:in `load'
/opt/gitlab/embedded/bin/bundle:23:in `<main>'
Full error
fatal: [deploy-cny-01-sv-gprd]: FAILED! => changed=true
ansible_facts:
discovered_interpreter_python: /usr/bin/python3
cmd: SKIP_POST_DEPLOYMENT_MIGRATIONS=1 /usr/bin/gitlab-rake db:migrate
delta: '0:03:22.440166'
end: '2023-04-26 22:00:27.899184'
msg: non-zero return code
rc: 1
start: '2023-04-26 21:57:05.459018'
stderr: |-
rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:
PG::ConnectionBad: PQsocket() can't get socket descriptor
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:408:in `rollback'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `public_send'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `block in write_using_load_balancer'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:127:in `block in read_write'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:198:in `retry_with_backoff'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:116:in `read_write'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:126:in `write_using_load_balancer'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:78:in `transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:369:in `block in transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:368:in `transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/lock_retry_mixin.rb:36:in `ddl_transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:15:in `block in with_advisory_lock_connection'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:12:in `with_advisory_lock_connection'
/opt/gitlab/embedded/bin/bundle:23:in `load'
/opt/gitlab/embedded/bin/bundle:23:in `<main>'
Caused by:
ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:408:in `rollback'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `public_send'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `block in write_using_load_balancer'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:127:in `block in read_write'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:198:in `retry_with_backoff'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:116:in `read_write'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:126:in `write_using_load_balancer'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:78:in `transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:369:in `block in transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:368:in `transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/lock_retry_mixin.rb:36:in `ddl_transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:15:in `block in with_advisory_lock_connection'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:12:in `with_advisory_lock_connection'
/opt/gitlab/embedded/bin/bundle:23:in `load'
/opt/gitlab/embedded/bin/bundle:23:in `<main>'
Caused by:
PG::ConnectionBad: PQsocket() can't get socket descriptor
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:408:in `rollback'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `public_send'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `block in write_using_load_balancer'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:127:in `block in read_write'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:198:in `retry_with_backoff'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:116:in `read_write'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:126:in `write_using_load_balancer'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:78:in `transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:369:in `block in transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:368:in `transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/lock_retry_mixin.rb:36:in `ddl_transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:15:in `block in with_advisory_lock_connection'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:12:in `with_advisory_lock_connection'
/opt/gitlab/embedded/bin/bundle:23:in `load'
/opt/gitlab/embedded/bin/bundle:23:in `<main>'
Caused by:
ActiveRecord::StatementInvalid: PG::ConnectionBad: PQconsumeInput() SSL connection has been closed unexpectedly
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migration_helpers/automatic_lock_writes_on_tables.rb:20:in `exec_migration'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `public_send'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `block in write_using_load_balancer'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:127:in `block in read_write'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:198:in `retry_with_backoff'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:116:in `read_write'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:126:in `write_using_load_balancer'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:78:in `transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:369:in `block in transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:368:in `transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/lock_retry_mixin.rb:36:in `ddl_transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:15:in `block in with_advisory_lock_connection'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:12:in `with_advisory_lock_connection'
/opt/gitlab/embedded/bin/bundle:23:in `load'
/opt/gitlab/embedded/bin/bundle:23:in `<main>'
Caused by:
PG::ConnectionBad: PQconsumeInput() SSL connection has been closed unexpectedly
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migration_helpers/automatic_lock_writes_on_tables.rb:20:in `exec_migration'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `public_send'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `block in write_using_load_balancer'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:127:in `block in read_write'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:198:in `retry_with_backoff'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:116:in `read_write'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:126:in `write_using_load_balancer'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:78:in `transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:369:in `block in transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:368:in `transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/lock_retry_mixin.rb:36:in `ddl_transaction'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:15:in `block in with_advisory_lock_connection'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:12:in `with_advisory_lock_connection'
/opt/gitlab/embedded/bin/bundle:23:in `load'
/opt/gitlab/embedded/bin/bundle:23:in `<main>'
Tasks: TOP => db:migrate
(See full trace by running task with --trace)
stderr_lines: <omitted>
stdout: |-
main: == [advisory_lock_connection] object_id: 227440, pg_backend_pid: 1087049
main: == 20230414230535 AddExternalIdentifiersIndexToImportFailures: migrating ======
main: -- transaction_open?()
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.2308s
main: -- index_exists?(:import_failures, :external_identifiers, {:name=>"index_import_failures_on_external_identifiers", :where=>"external_identifiers != '{}'", :algorithm=>:concurrently})
main: -> 0.1691s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0338s
main: -- add_index(:import_failures, :external_identifiers, {:name=>"index_import_failures_on_external_identifiers", :where=>"external_identifiers != '{}'", :algorithm=>:concurrently})
main: -> 25.1662s
main: -- execute("RESET statement_timeout")
main: -> 0.0323s
main: == 20230414230535 AddExternalIdentifiersIndexToImportFailures: migrated (26.0259s)
main: == 20230419130952 RemoveGithubImportJobInstances: migrating ===================
stdout_lines: <omitted>
The sidekiq_remove_jobs
is used to remove worker classes https://docs.gitlab.com/ee/development/sidekiq/compatibility_across_updates.html#in-a-subsequent-separate-minor-release and it appears it is not compatible with the latest migration library.