Investigate recent migration failures when using `sidekiq_remove_jobs`

Status

We think this migration failures are caused by the 30 second idle in transaction timeout.

Rails migrations have a DDL transcation open by default. On production we may have a large number of Sidekiq jobs to be removed, which causes the migration DDL transaction to be open longer than 30s. PostgreSQL then closes the connection.

Workaround

Add disable_ddl_transaction! to the migration that uses sidekiq_remove_jobs

Description

Two Three production incidents have been caused by the usage of remove_sidekiq_jobs on migrations:

Migrations failed with:

stderr: |-
    rake aborted!
    StandardError: An error has occurred, this and all later migrations canceled:
  
    PG::ConnectionBad: PQsocket() can't get socket descriptor
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:408:in `rollback'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `public_send'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `block in write_using_load_balancer'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:127:in `block in read_write'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:198:in `retry_with_backoff'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:116:in `read_write'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:126:in `write_using_load_balancer'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:78:in `transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:369:in `block in transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:368:in `transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/lock_retry_mixin.rb:36:in `ddl_transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:15:in `block in with_advisory_lock_connection'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:12:in `with_advisory_lock_connection'
    /opt/gitlab/embedded/bin/bundle:23:in `load'
    /opt/gitlab/embedded/bin/bundle:23:in `<main>'
Full error
fatal: [deploy-cny-01-sv-gprd]: FAILED! => changed=true 
  ansible_facts:
    discovered_interpreter_python: /usr/bin/python3
  cmd: SKIP_POST_DEPLOYMENT_MIGRATIONS=1 /usr/bin/gitlab-rake db:migrate
  delta: '0:03:22.440166'
  end: '2023-04-26 22:00:27.899184'
  msg: non-zero return code
  rc: 1
  start: '2023-04-26 21:57:05.459018'
  stderr: |-
    rake aborted!
    StandardError: An error has occurred, this and all later migrations canceled:
  
    PG::ConnectionBad: PQsocket() can't get socket descriptor
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:408:in `rollback'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `public_send'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `block in write_using_load_balancer'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:127:in `block in read_write'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:198:in `retry_with_backoff'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:116:in `read_write'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:126:in `write_using_load_balancer'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:78:in `transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:369:in `block in transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:368:in `transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/lock_retry_mixin.rb:36:in `ddl_transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:15:in `block in with_advisory_lock_connection'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:12:in `with_advisory_lock_connection'
    /opt/gitlab/embedded/bin/bundle:23:in `load'
    /opt/gitlab/embedded/bin/bundle:23:in `<main>'
  
    Caused by:
    ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:408:in `rollback'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `public_send'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `block in write_using_load_balancer'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:127:in `block in read_write'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:198:in `retry_with_backoff'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:116:in `read_write'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:126:in `write_using_load_balancer'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:78:in `transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:369:in `block in transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:368:in `transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/lock_retry_mixin.rb:36:in `ddl_transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:15:in `block in with_advisory_lock_connection'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:12:in `with_advisory_lock_connection'
    /opt/gitlab/embedded/bin/bundle:23:in `load'
    /opt/gitlab/embedded/bin/bundle:23:in `<main>'
  
    Caused by:
    PG::ConnectionBad: PQsocket() can't get socket descriptor
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:408:in `rollback'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `public_send'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `block in write_using_load_balancer'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:127:in `block in read_write'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:198:in `retry_with_backoff'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:116:in `read_write'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:126:in `write_using_load_balancer'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:78:in `transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:369:in `block in transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:368:in `transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/lock_retry_mixin.rb:36:in `ddl_transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:15:in `block in with_advisory_lock_connection'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:12:in `with_advisory_lock_connection'
    /opt/gitlab/embedded/bin/bundle:23:in `load'
    /opt/gitlab/embedded/bin/bundle:23:in `<main>'
  
    Caused by:
    ActiveRecord::StatementInvalid: PG::ConnectionBad: PQconsumeInput() SSL connection has been closed unexpectedly
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migration_helpers/automatic_lock_writes_on_tables.rb:20:in `exec_migration'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `public_send'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `block in write_using_load_balancer'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:127:in `block in read_write'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:198:in `retry_with_backoff'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:116:in `read_write'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:126:in `write_using_load_balancer'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:78:in `transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:369:in `block in transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:368:in `transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/lock_retry_mixin.rb:36:in `ddl_transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:15:in `block in with_advisory_lock_connection'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:12:in `with_advisory_lock_connection'
    /opt/gitlab/embedded/bin/bundle:23:in `load'
    /opt/gitlab/embedded/bin/bundle:23:in `<main>'
  
    Caused by:
    PG::ConnectionBad: PQconsumeInput() SSL connection has been closed unexpectedly
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migration_helpers/automatic_lock_writes_on_tables.rb:20:in `exec_migration'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `public_send'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `block in write_using_load_balancer'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:127:in `block in read_write'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:198:in `retry_with_backoff'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/load_balancer.rb:116:in `read_write'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:126:in `write_using_load_balancer'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/load_balancing/connection_proxy.rb:78:in `transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:369:in `block in transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database.rb:368:in `transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/lock_retry_mixin.rb:36:in `ddl_transaction'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:15:in `block in with_advisory_lock_connection'
    /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:12:in `with_advisory_lock_connection'
    /opt/gitlab/embedded/bin/bundle:23:in `load'
    /opt/gitlab/embedded/bin/bundle:23:in `<main>'
    Tasks: TOP => db:migrate
    (See full trace by running task with --trace)
  stderr_lines: <omitted>
  stdout: |-
    main: == [advisory_lock_connection] object_id: 227440, pg_backend_pid: 1087049
    main: == 20230414230535 AddExternalIdentifiersIndexToImportFailures: migrating ======
    main: -- transaction_open?()
    main:    -> 0.0000s
    main: -- view_exists?(:postgres_partitions)
    main:    -> 0.2308s
    main: -- index_exists?(:import_failures, :external_identifiers, {:name=>"index_import_failures_on_external_identifiers", :where=>"external_identifiers != '{}'", :algorithm=>:concurrently})
    main:    -> 0.1691s
    main: -- execute("SET statement_timeout TO 0")
    main:    -> 0.0338s
    main: -- add_index(:import_failures, :external_identifiers, {:name=>"index_import_failures_on_external_identifiers", :where=>"external_identifiers != '{}'", :algorithm=>:concurrently})
    main:    -> 25.1662s
    main: -- execute("RESET statement_timeout")
    main:    -> 0.0323s
    main: == 20230414230535 AddExternalIdentifiersIndexToImportFailures: migrated (26.0259s)
  
    main: == 20230419130952 RemoveGithubImportJobInstances: migrating ===================
  stdout_lines: <omitted>

The sidekiq_remove_jobs is used to remove worker classes https://docs.gitlab.com/ee/development/sidekiq/compatibility_across_updates.html#in-a-subsequent-separate-minor-release and it appears it is not compatible with the latest migration library.

Edited by Thong Kuah