Auto-canceling pipelines times out causing pipeline not to be processed
Summary
Creating this issue after spotting this errors in Sentry https://sentry.gitlab.net/gitlab/gitlabcom/issues/1909105 that were recently introduced.
PG::QueryCanceled: ERROR: canceling statement due to statement timeout
SELECT "ci_pipelines".* FROM "ci_pipelines"
WHERE "ci_pipelines"."project_id" = $1
AND ("ci_pipelines"."source" IN ($2, $3, $4, $5, $6, $7, $8, $9, $10, $11) OR "ci_pipelines"."source" IS NULL)
AND "ci_pipelines"."ref" = $12
AND "ci_pipelines"."id" NOT IN (
WITH RECURSIVE "base_and_descendants" AS (
(WITH RECURSIVE "base_and_ancestors" AS (
(SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."id" = $13)
UNION
(SELECT "ci_pipelines".* FROM "ci_pipelines", "base_and_ancestors", "ci_sources_pipelines"
WHERE "ci_sources_pipelines"."source_pipeline_id" = "ci_pipelines"."id"
AND "ci_sources_pipelines"."pipeline_id" = "base_and_ancestors"."id"
AND "ci_sources_pipelines"."source_project_id" = "ci_sources_pipelines"."project_id"
)
) SELECT "ci_pipelines".* FROM "base_and_ancestors" AS "ci_pipelines"
)
UNION
(SELECT "ci_pipelines".* FROM "ci_pipelines", "base_and_descendants", "ci_sources_pipelines"
WHERE "ci_sources_pipelines"."pipeline_id" = "ci_pipelines"."id"
AND "ci_sources_pipelines"."source_pipeline_id" = "base_and_descendants"."id"
AND "ci_sources_pipelines"."source_project_id" = "ci_sources_pipelines"."project_id"
)
)
SELECT "id" FROM "base_and_descendants" AS "ci_pipelines"
)
AND "ci_pipelines"."sha" != $14
AND ("ci_pipelines"."status" IN ($15,$16,$17,$18,$19,$20))
AND (NOT EXISTS (
SELECT "ci_builds".* FROM "ci_builds"
INNER JOIN "ci_builds_metadata" ON "ci_builds_metadata"."build_id" = "ci_builds"."id"
WHERE "ci_builds"."type" = $21
AND (ci_builds.commit_id = ci_pipelines.id)
AND ("ci_builds"."status" IN ($22,$23,$24))
AND (ci_builds_metadata.id NOT IN (SELECT "ci_builds_metadata"."id" FROM "ci_builds_metadata" WHERE (ci_builds_metadata.build_id = ci_builds.id)
AND "ci_builds_metadata"."interruptible" = $25))))
ORDER BY "ci_pipelines"."id" ASC LIMIT $26
from active_record/connection_adapters/postgresql_adapter.rb:675:in `exec_params'
from active_record/connection_adapters/postgresql_adapter.rb:675:in `block (2 levels) in exec_no_cache'
from active_support/dependencies/interlock.rb:48:in `block in permit_concurrent_loads'
from active_support/concurrency/share_lock.rb:187:in `yield_shares'
from active_support/dependencies/interlock.rb:47:in `permit_concurrent_loads'
from active_record/connection_adapters/postgresql_adapter.rb:674:in `block in exec_no_cache'
from active_record/connection_adapters/abstract_adapter.rb:722:in `block (2 levels) in log'
from active_support/concurrency/load_interlock_aware_monitor.rb:26:in `block (2 levels) in synchronize'
from active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt'
from active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize'
from active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
from active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'
from active_record/connection_adapters/abstract_adapter.rb:721:in `block in log'
from active_support/notifications/instrumenter.rb:24:in `instrument'
from active_record/connection_adapters/abstract_adapter.rb:712:in `log'
from active_record/connection_adapters/postgresql_adapter.rb:673:in `exec_no_cache'
from active_record/connection_adapters/postgresql_adapter.rb:654:in `execute_and_clear'
from marginalia.rb:94:in `execute_and_clear_with_marginalia'
from active_record/connection_adapters/postgresql/database_statements.rb:98:in `exec_query'
from active_record/connection_adapters/abstract/database_statements.rb:489:in `select'
from active_record/connection_adapters/abstract/database_statements.rb:70:in `select_all'
from active_record/connection_adapters/abstract/query_cache.rb:107:in `select_all'
from active_record/querying.rb:46:in `find_by_sql'
from active_record/relation.rb:821:in `block in exec_queries'
from active_record/relation.rb:836:in `block in skip_query_cache_if_necessary'
from active_record/connection_adapters/abstract/query_cache.rb:79:in `uncached'
from active_record/query_cache.rb:21:in `uncached'
from active_record/relation/delegation.rb:66:in `block in uncached'
from active_record/relation.rb:407:in `block in scoping'
from active_record/relation.rb:784:in `_scoping'
from active_record/relation.rb:407:in `scoping'
from active_record/relation/delegation.rb:66:in `uncached'
from active_record/relation.rb:835:in `skip_query_cache_if_necessary'
from active_record/relation.rb:808:in `exec_queries'
from active_record/association_relation.rb:36:in `exec_queries'
from active_record/relation.rb:626:in `load'
from active_record/relation.rb:250:in `records'
from active_record/relation/batches.rb:224:in `block in in_batches'
from active_record/relation/batches.rb:222:in `loop'
from active_record/relation/batches.rb:222:in `in_batches'
from active_record/relation/batches.rb:135:in `find_in_batches'
from active_record/relation/batches.rb:69:in `find_each'
from gitlab/ci/pipeline/chain/cancel_pending_pipelines.rb:14:in `block in perform!'
from gitlab/optimistic_locking.rb:11:in `block in retry_lock'
from active_record/connection_adapters/abstract/database_statements.rb:280:in `block in transaction'
from active_record/connection_adapters/abstract/transaction.rb:280:in `block in within_new_transaction'
from active_support/concurrency/load_interlock_aware_monitor.rb:26:in `block (2 levels) in synchronize'
from active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt'
from active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize'
from active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
from active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'
from active_record/connection_adapters/abstract/transaction.rb:278:in `within_new_transaction'
from active_record/connection_adapters/abstract/database_statements.rb:280:in `transaction'
from active_record/transactions.rb:212:in `transaction'
from gitlab/optimistic_locking.rb:10:in `retry_lock'
from gitlab/ci/pipeline/chain/cancel_pending_pipelines.rb:13:in `perform!'
from gitlab/ci/pipeline/chain/sequence.rb:19:in `block in build!'
from gitlab/ci/pipeline/chain/sequence.rb:16:in `each'
from gitlab/ci/pipeline/chain/sequence.rb:16:in `build!'
from ci/create_pipeline_service.rb:81:in `execute'
from ci/create_pipeline_service.rb:93:in `execute!'
from run_pipeline_schedule_worker.rb:25:in `run_pipeline_schedule'
from run_pipeline_schedule_worker.rb:17:in `perform'
from sidekiq/processor.rb:192:in `execute_job'
from sidekiq/processor.rb:165:in `block (2 levels) in process'
from sidekiq/middleware/chain.rb:128:in `block in invoke'
from gitlab/metrics/sidekiq_middleware.rb:18:in `block in call'
from gitlab/metrics/transaction.rb:61:in `run'
from gitlab/metrics/sidekiq_middleware.rb:18:in `call'
from sidekiq/middleware/chain.rb:130:in `block in invoke'
from gitlab/sidekiq_middleware/duplicate_jobs/strategies/until_executing.rb:16:in `perform'
from gitlab/sidekiq_middleware/duplicate_jobs/duplicate_job.rb:40:in `perform'
from gitlab/sidekiq_middleware/duplicate_jobs/server.rb:8:in `call'
from sidekiq/middleware/chain.rb:130:in `block in invoke'
from gitlab/sidekiq_middleware/worker_context.rb:9:in `wrap_in_optional_context'
from gitlab/sidekiq_middleware/worker_context/server.rb:17:in `block in call'
from gitlab/application_context.rb:54:in `block in use'
from labkit/context.rb:32:in `with_context'
from gitlab/application_context.rb:54:in `use'
from gitlab/application_context.rb:21:in `with_context'
from gitlab/sidekiq_middleware/worker_context/server.rb:15:in `call'
from sidekiq/middleware/chain.rb:130:in `block in invoke'
from gitlab/sidekiq_status/server_middleware.rb:7:in `call'
from sidekiq/middleware/chain.rb:130:in `block in invoke'
from gitlab/sidekiq_versioning/middleware.rb:9:in `call'
from sidekiq/middleware/chain.rb:130:in `block in invoke'
from gitlab/sidekiq_middleware/admin_mode/server.rb:8:in `call'
from sidekiq/middleware/chain.rb:130:in `block in invoke'
from gitlab/sidekiq_middleware/instrumentation_logger.rb:7:in `call'
from sidekiq/middleware/chain.rb:130:in `block in invoke'
from labkit/middleware/sidekiq/server.rb:21:in `block in call'
from sidekiq/middleware/chain.rb:128:in `block in invoke'
from labkit/middleware/sidekiq/context/server.rb:14:in `block in call'
from labkit/context.rb:32:in `with_context'
from labkit/middleware/sidekiq/context/server.rb:13:in `call'
from sidekiq/middleware/chain.rb:130:in `block in invoke'
from sidekiq/middleware/chain.rb:133:in `invoke'
from labkit/middleware/sidekiq/server.rb:20:in `call'
from sidekiq/middleware/chain.rb:130:in `block in invoke'
from gitlab/sidekiq_middleware/batch_loader.rb:7:in `call'
from sidekiq/middleware/chain.rb:130:in `block in invoke'
from gitlab/sidekiq_middleware/extra_done_log_metadata.rb:7:in `call'
from sidekiq/middleware/chain.rb:130:in `block in invoke'
from gitlab/sidekiq_middleware/request_store_middleware.rb:10:in `block in call'
from gitlab/with_request_store.rb:17:in `enabling_request_store'
from gitlab/with_request_store.rb:10:in `with_request_store'
from gitlab/sidekiq_middleware/request_store_middleware.rb:9:in `call'
from sidekiq/middleware/chain.rb:130:in `block in invoke'
from gitlab/sidekiq_middleware/server_metrics.rb:35:in `call'
from sidekiq/middleware/chain.rb:130:in `block in invoke'
from gitlab/sidekiq_middleware/monitor.rb:8:in `block in call'
from gitlab/sidekiq_daemon/monitor.rb:49:in `within_job'
from gitlab/sidekiq_middleware/monitor.rb:7:in `call'
from sidekiq/middleware/chain.rb:130:in `block in invoke'
from marginalia/sidekiq_instrumentation.rb:10:in `call'
from sidekiq/middleware/chain.rb:130:in `block in invoke'
from raven/integrations/sidekiq.rb:9:in `call'
from sidekiq/middleware/chain.rb:130:in `block in invoke'
from sidekiq/middleware/chain.rb:133:in `invoke'
from sidekiq/processor.rb:164:in `block in process'
from sidekiq/processor.rb:137:in `block (6 levels) in dispatch'
from sidekiq/job_retry.rb:109:in `local'
from sidekiq/processor.rb:136:in `block (5 levels) in dispatch'
from sidekiq/rails.rb:43:in `block in call'
from active_support/execution_wrapper.rb:88:in `wrap'
from active_support/reloader.rb:72:in `block in wrap'
from active_support/execution_wrapper.rb:88:in `wrap'
from active_support/reloader.rb:71:in `wrap'
from sidekiq/rails.rb:42:in `call'
from sidekiq/processor.rb:132:in `block (4 levels) in dispatch'
from sidekiq/processor.rb:250:in `stats'
from sidekiq/processor.rb:127:in `block (3 levels) in dispatch'
from gitlab/sidekiq_logging/structured_logger.rb:18:in `call'
from sidekiq/processor.rb:126:in `block (2 levels) in dispatch'
from sidekiq/job_retry.rb:74:in `global'
from sidekiq/processor.rb:125:in `block in dispatch'
from sidekiq/logging.rb:48:in `with_context'
from sidekiq/logging.rb:42:in `with_job_hash_context'
from sidekiq/processor.rb:124:in `dispatch'
from sidekiq/processor.rb:163:in `process'
from sidekiq/processor.rb:83:in `process_one'
from sidekiq/processor.rb:71:in `run'
from sidekiq/util.rb:16:in `watchdog'
from sidekiq/util.rb:25:in `block in safe_thread'
Steps to reproduce
Haven't reproduced it yet but it should be possible by
- having auto-canceling of pipelines enabled
- creating a parent pipeline that spawns several child pipelines
- before the parent pipeline complets, create a new pipeline for the same ref+sha that should cause the cancelation of the old one + child pipelines
Example Project
What is the current bug behavior?
An error is raised and the user receives an Undefined error (<correlation_id>) message back with 500 response.
The user may see a new pipeline created successfully but it won't start because we fail auto-canceling old pipelines prior to start processing the new pipeline.
What is the expected correct behavior?
Pipeline should be created and old pipelines successfully canceled
Relevant logs and/or screenshots
https://sentry.gitlab.net/gitlab/gitlabcom/issues/1909105
Output of checks
This bug happens on GitLab.com
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Possible fixes
The query that times out is located in the Gitlab::Ci::Pipeline::Chain::CancelPendingPipelines and could be related to the recent changes to Ci::Pipeline#same_family_pipeline_ids where we use a new PipelineObjectHierarchy.
- If the issue is related to the fact that from a given pipeline we look for it's ancestor first and then all its descendants, in order to find all pipelines in the hierarchy, maybe we could fire those queries separately rather than having a very large statement. MR: !46575 (merged)
- Query
with_only_interruptible_buildsin batches. Example of code in #273733 (comment 439759339) - Can
Chain::CancelPendingPipelinesperform this work asynchronously? Is it necessary to cancel pending pipelines prior to process a new one? Currently this step, being synchronous, becomes a critical path.