Ci::ClickHouse::FinishedPipelinesSyncWorker sometimes fails with timeout (ActiveRecord::QueryCanceled)

Due to our flexible logic supporting dispatching multiple parallel workers to perform ingestions of pipelines and builds, we leverage Gitlab::Pagination::Keyset::Iterator to generate an optimized query. Unfortunately, the generated query is very complex and regularly times out during a typical day:

https://log.gprd.gitlab.net/app/r/s/H7iTj

image

Given that we've seen proof that a single worker is more than capable of handling the current peak loads of .com pipelines, we should be able to simplify the query drastically when there is a single worker running, thereby gaining even more headroom.

Implementation plan

  • Add migration to implement new index for pipeline_id WHERE processed = false, so that we don't rely on the (pipeline_id % 100::bigint) index.
  • Add feature flag to control whether to use the old query or the new old
  • Add new query that doesn't rely on Gitlab::Pagination::Keyset::Iterator and instead relies on the EachBatch module, when there is a single worker active.
Edited by 🤖 GitLab Bot 🤖