Sidekiq stuck on Redis or MySQL connections
Customer has an HA setup with a MySQL database and Redis Sentinel. One node appears to have stuck Sidekiq jobs. When we first logged in, there were about 44 running jobs. About 30 minutes later, it crept up slowly to 55.
Attached is the redacted TTIN output. It looks like most of the threads have one of three backtraces:
DB release_connection
2017-04-24_21:19:41.83783 2017-04-24T21:19:41.836Z 30447 TID-otapkuv4w WARN: Thread TID-otan7jcis
2017-04-24_21:19:41.83783 2017-04-24T21:19:41.836Z 30447 TID-otapkuv4w WARN: /opt/gitlab/embedded/lib/ruby/2.3.0/monitor.rb:187:in `lock'
2017-04-24_21:19:41.83783 /opt/gitlab/embedded/lib/ruby/2.3.0/monitor.rb:187:in `mon_enter'
2017-04-24_21:19:41.83783 /opt/gitlab/embedded/lib/ruby/2.3.0/monitor.rb:212:in `mon_synchronize'
2017-04-24_21:19:41.83783 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:280:in `release_connection'
2017-04-24_21:19:41.83784 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:552:in `each'
2017-04-24_21:19:41.83784 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:552:in `clear_active_connections!'
2017-04-24_21:19:41.83784 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/activerecord-4.2.7.1/lib/active_record/connection_handling.rb:129:in `clear_active_connections!'
2017-04-24_21:19:41.83784 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/sidekiq-4.2.7/lib/sidekiq/middleware/server/active_record.rb:8:in `call'
2017-04-24_21:19:41.83785 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/sidekiq-4.2.7/lib/sidekiq/middleware/chain.rb:130:in `block in invoke'
2017-04-24_21:19:41.83785 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/sidekiq-4.2.7/lib/sidekiq/middleware/server/retry_jobs.rb:74:in `call'
2017-04-24_21:19:41.83785 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/sidekiq-4.2.7/lib/sidekiq/middleware/chain.rb:130:in `block in invoke'
DB retrieve_connection
2017-04-24_21:19:41.85436 2017-04-24T21:19:41.853Z 30447 TID-otapkuv4w WARN: Thread TID-otapkqrk4
2017-04-24_21:19:41.85436 2017-04-24T21:19:41.853Z 30447 TID-otapkuv4w WARN: /opt/gitlab/embedded/lib/ruby/2.3.0/monitor.rb:187:in `lock'
2017-04-24_21:19:41.85437 /opt/gitlab/embedded/lib/ruby/2.3.0/monitor.rb:187:in `mon_enter'
2017-04-24_21:19:41.85437 /opt/gitlab/embedded/lib/ruby/2.3.0/monitor.rb:212:in `mon_synchronize'
2017-04-24_21:19:41.85437 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:262:in `connection'
2017-04-24_21:19:41.85437 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:571:in `retrieve_connection'
2017-04-24_21:19:41.85437 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/activerecord-4.2.7.1/lib/active_record/connection_handling.rb:113:in `retrieve_connection'
2017-04-24_21:19:41.85438 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/activerecord-4.2.7.1/lib/active_record/connection_handling.rb:87:in `connection'
2017-04-24_21:19:41.85438 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/activerecord-4.2.7.1/lib/active_record/core.rb:187:in `find_by'
2017-04-24_21:19:41.85438 /opt/gitlab/embedded/service/gitlab-rails/app/workers/build_queue_worker.rb:6:in `perform'
2017-04-24_21:19:41.85438 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/sidekiq-4.2.7/lib/sidekiq/processor.rb:158:in `execute_job'
2017-04-24_21:19:41.85438 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/sidekiq-4.2.7/lib/sidekiq/processor.rb:138:in `block (4 levels) in process'
Redis select
2017-04-24_21:19:41.84037 2017-04-24T21:19:41.839Z 30447 TID-otapkuv4w WARN: Thread TID-otan7j2wo
2017-04-24_21:19:41.84037 2017-04-24T21:19:41.840Z 30447 TID-otapkuv4w WARN: /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/redis-3.2.2/lib/redis/connection/ruby.rb:52:in `select'
2017-04-24_21:19:41.84038 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/redis-3.2.2/lib/redis/connection/ruby.rb:52:in `rescue in _read_from_socket'
2017-04-24_21:19:41.84038 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/redis-3.2.2/lib/redis/connection/ruby.rb:48:in `_read_from_socket'
2017-04-24_21:19:41.84038 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/redis-3.2.2/lib/redis/connection/ruby.rb:41:in `gets'
2017-04-24_21:19:41.84038 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/redis-3.2.2/lib/redis/connection/ruby.rb:273:in `read'
2017-04-24_21:19:41.84038 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/redis-3.2.2/lib/redis/client.rb:249:in `block in read'
2017-04-24_21:19:41.84039 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/redis-3.2.2/lib/redis/client.rb:237:in `io'
2017-04-24_21:19:41.84039 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/redis-3.2.2/lib/redis/client.rb:248:in `read'
2017-04-24_21:19:41.84039 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/redis-3.2.2/lib/redis/client.rb:113:in `block in call'
2017-04-24_21:19:41.84039 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/redis-3.2.2/lib/redis/client.rb:218:in `block (2 levels) in process'
2017-04-24_21:19:41.84039 /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/redis-3.2.2/lib/redis/client.rb:354:in `ensure_connected'
This suggests to me that Sidekiq got into some deadlock trying to synchronize threads for the database connection pool. Some relevant issues that point the blame to issues with Timeout.timeout:
- https://github.com/rails/rails/issues/10039
- https://github.com/rails/rails/issues/22031
- https://github.com/mperham/sidekiq/issues/1047#issuecomment-20641031
Note that BOTH mysql2 and redis-rb gems have updates to fix issues with Timeout:
- https://github.com/brianmario/mysql2/releases
- https://github.com/redis/redis-rb/blob/master/CHANGELOG.md#333
Here is what we are doing about it:
- Upgraded to the latest mysql2 gem: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/1715
- Upgraded to the latest Sidekiq (still uses redis-rb 3.2 that uses
Timeout): https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/10899 - Upgraded Sidekiq to the latest redis-rb version that drops the use of
Timeout: https://github.com/mperham/sidekiq/pull/3441 - Created an issue to remove
Timeoutfrom our LDAP code: https://gitlab.com/gitlab-org/gitlab-ce/issues/31369
/cc: @gl-support
ZenDesk: https://gitlab.zendesk.com/agent/tickets/72840, https://gitlab.zendesk.com/agent/tickets/56118
Edited by 🤖 GitLab Bot 🤖