Ci::ClickHouse::FinishedPipelinesSyncWorker sometimes fails with timeout (ActiveRecord::QueryCanceled)
Due to our flexible logic supporting dispatching multiple parallel workers to perform ingestions of pipelines and builds, we leverage Gitlab::Pagination::Keyset::Iterator to generate an optimized query. Unfortunately, the generated query is very complex and regularly times out during a typical day:
https://log.gprd.gitlab.net/app/r/s/H7iTj
Given that we've seen proof that a single worker is more than capable of handling the current peak loads of .com pipelines, we should be able to simplify the query drastically when there is a single worker running, thereby gaining even more headroom.
Implementation plan
- Add migration to implement new index for
pipeline_id WHERE processed = false, so that we don't rely on the(pipeline_id % 100::bigint)index. - Add feature flag to control whether to use the old query or the new old
- Add new query that doesn't rely on
Gitlab::Pagination::Keyset::Iteratorand instead relies on theEachBatchmodule, when there is a single worker active.
Edited by 🤖 GitLab Bot 🤖
