Errors after upgrade to 16.4: SystemStackError (stack level too deep) coming from Sidekiq/Geo::VerificationStateBackfillWorker
While still investigating, one new error related to Geo on primary I've noticed after updating to 16.4.1 is
SystemStackError
(stack level too deep
) coming fromSidekiq/Geo::VerificationStateBackfillWorker
. Is it possible there is some kind of an infinite loop happening during primary checksum recalculation?
Our Gitlab is connected to Sentry so I have all errors there. Over 1k of this one since the update.
Stacktrace
SystemStackError: stack level too deep from activesupport (7.0.6) lib/active_support/core_ext/enumerable.rb:248:in `in_order_of' from activerecord (7.0.6) lib/active_record/relation/finder_methods.rb:510:in `find_some_ordered' from activerecord (7.0.6) lib/active_record/relation/finder_methods.rb:481:in `find_some' from activerecord (7.0.6) lib/active_record/relation/finder_methods.rb:460:in `find_with_ids' from activerecord (7.0.6) lib/active_record/relation/finder_methods.rb:69:in `find' from activerecord (7.0.6) lib/active_record/querying.rb:22:in `find' from activerecord (7.0.6) lib/active_record/core.rb:275:in `find' from geo/verification_state_backfill_service.rb:72:in `create_verification_details' from geo/verification_state_backfill_service.rb:65:in `handle_differences_in_verifiables' from geo/verification_state_backfill_service.rb:24:in `execute' from geo/verifiable_replicator.rb:124:in `backfill_verification_state_table' from geo/verification_state_backfill_worker.rb:20:in `perform' from reenqueuer.rb:49:in `block (3 levels) in perform' from reenqueuer.rb:83:in `ensure_minimum_duration' from reenqueuer.rb:48:in `block (2 levels) in perform' from reenqueuer.rb:58:in `reenqueue' from reenqueuer.rb:47:in `block in perform' from exclusive_lease_guard.rb:29:in `try_obtain_lease' from reenqueuer.rb:46:in `perform' from sidekiq (6.5.7) lib/sidekiq/processor.rb:202:in `execute_job' from sidekiq (6.5.7) lib/sidekiq/processor.rb:170:in `block (2 levels) in process' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:177:in `block in invoke' from gitlab/sidekiq_middleware/skip_jobs.rb:49:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from gitlab/database/load_balancing/sidekiq_server_middleware.rb:29:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from gitlab/sidekiq_middleware/duplicate_jobs/strategies/none.rb:14:in `perform' from gitlab/sidekiq_middleware/duplicate_jobs/duplicate_job.rb:44:in `perform' from gitlab/sidekiq_middleware/duplicate_jobs/server.rb:8:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from gitlab/sidekiq_middleware/pause_control/strategies/base.rb:31:in `perform' from gitlab/sidekiq_middleware/pause_control/strategy_handler.rb:22:in `perform' from gitlab/sidekiq_middleware/pause_control/server.rb:8:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from gitlab/sidekiq_middleware/worker_context.rb:9:in `wrap_in_optional_context' from gitlab/sidekiq_middleware/worker_context/server.rb:19:in `block in call' from gitlab/application_context.rb:124:in `block in use' from gitlab-labkit (0.34.0) lib/labkit/context.rb:35:in `with_context' from gitlab/application_context.rb:124:in `use' from gitlab/application_context.rb:62:in `with_context' from gitlab/sidekiq_middleware/worker_context/server.rb:17:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from gitlab/sidekiq_status/server_middleware.rb:7:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from gitlab/sidekiq_versioning/middleware.rb:9:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from gitlab/sidekiq_middleware/query_analyzer.rb:7:in `block in call' from gitlab/database/query_analyzer.rb:37:in `within' from gitlab/sidekiq_middleware/query_analyzer.rb:7:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from gitlab/sidekiq_middleware/admin_mode/server.rb:14:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from gitlab/sidekiq_middleware/instrumentation_logger.rb:9:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from gitlab/sidekiq_middleware/batch_loader.rb:7:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from gitlab/sidekiq_middleware/extra_done_log_metadata.rb:7:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from gitlab/sidekiq_middleware/request_store_middleware.rb:8:in `block in call' from gitlab/safe_request_store.rb:66:in `enabling_request_store' from gitlab/safe_request_store.rb:59:in `ensure_request_store' from gitlab/sidekiq_middleware/request_store_middleware.rb:7:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from gitlab/sidekiq_middleware/server_metrics.rb:94:in `block in call' from gitlab/sidekiq_middleware/server_metrics.rb:122:in `block in instrument' from gitlab/metrics/background_transaction.rb:33:in `run' from gitlab/sidekiq_middleware/server_metrics.rb:122:in `instrument' from gitlab/sidekiq_middleware/server_metrics.rb:93:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from gitlab-labkit (0.34.0) lib/labkit/middleware/sidekiq/server.rb:21:in `block in call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:177:in `block in invoke' from gitlab-labkit (0.34.0) lib/labkit/middleware/sidekiq/context/server.rb:16:in `block in call' from gitlab-labkit (0.34.0) lib/labkit/context.rb:35:in `with_context' from gitlab-labkit (0.34.0) lib/labkit/middleware/sidekiq/context/server.rb:15:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:182:in `invoke' from gitlab-labkit (0.34.0) lib/labkit/middleware/sidekiq/server.rb:20:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from gitlab/sidekiq_middleware/monitor.rb:10:in `block in call' from gitlab/sidekiq_daemon/monitor.rb:46:in `within_job' from gitlab/sidekiq_middleware/monitor.rb:9:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from gitlab/sidekiq_middleware/size_limiter/server.rb:13:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from marginalia (1.11.1) lib/marginalia/sidekiq_instrumentation.rb:9:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from sentry-sidekiq (5.8.0) lib/sentry/sidekiq/sentry_context_middleware.rb:26:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from sentry-raven (3.1.2) lib/raven/integrations/sidekiq/cleanup_middleware.rb:7:in `call' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:179:in `block in invoke' from sidekiq (6.5.7) lib/sidekiq/middleware/chain.rb:182:in `invoke' from sidekiq (6.5.7) lib/sidekiq/processor.rb:169:in `block in process' from sidekiq (6.5.7) lib/sidekiq/processor.rb:136:in `block (6 levels) in dispatch' from sidekiq (6.5.7) lib/sidekiq/job_retry.rb:113:in `local' from sidekiq (6.5.7) lib/sidekiq/processor.rb:135:in `block (5 levels) in dispatch' from sidekiq (6.5.7) lib/sidekiq/rails.rb:14:in `block in call' from activesupport (7.0.6) lib/active_support/execution_wrapper.rb:92:in `wrap' from activesupport (7.0.6) lib/active_support/reloader.rb:72:in `block in wrap' from activesupport (7.0.6) lib/active_support/execution_wrapper.rb:92:in `wrap' from activesupport (7.0.6) lib/active_support/reloader.rb:71:in `wrap' from sidekiq (6.5.7) lib/sidekiq/rails.rb:13:in `call' from sidekiq (6.5.7) lib/sidekiq/processor.rb:131:in `block (4 levels) in dispatch' from sidekiq (6.5.7) lib/sidekiq/processor.rb:263:in `stats' from sidekiq (6.5.7) lib/sidekiq/processor.rb:126:in `block (3 levels) in dispatch' from gitlab/sidekiq_logging/structured_logger.rb:21:in `call' from sidekiq (6.5.7) lib/sidekiq/processor.rb:125:in `block (2 levels) in dispatch' from sidekiq (6.5.7) lib/sidekiq/job_retry.rb:80:in `global' from sidekiq (6.5.7) lib/sidekiq/processor.rb:124:in `block in dispatch' from sidekiq (6.5.7) lib/sidekiq/job_logger.rb:39:in `prepare' from sidekiq (6.5.7) lib/sidekiq/processor.rb:123:in `dispatch' from sidekiq (6.5.7) lib/sidekiq/processor.rb:168:in `process' from sidekiq (6.5.7) lib/sidekiq/processor.rb:78:in `process_one' from sidekiq (6.5.7) lib/sidekiq/processor.rb:68:in `run' from sidekiq (6.5.7) lib/sidekiq/component.rb:8:in `watchdog' from sidekiq (6.5.7) lib/sidekiq/component.rb:17:in `block in safe_thread'
@ssoloch Interesting, I don't see an infinite loop or something like that there
🤔
I've increased MRI stack size a few hours ago and haven't seen another one yet - while in the past there were 8-9 errors per hour (matching number of projects still failing to sync). Hard to say if it helped with anything, there are a lot of strange errors in this release - various API calls on primary are timing out on Gitaly client mutex, some Geo requests are timing out, on restart deadlock exceptions from recursive locking are raised... Will check tomorrow if number of failed projects decreased.
Thanks for trying that. I suspect the stack is basically legitimate, but perhaps we (GitLab) need to take action to increase stack size for everyone if that's possible, or else reduce the produced stack. I am not really sure yet, as it's not something I've dealt with before. I'll open an issue to track this.