Disable Sidekiq retries for ClickHouse pipeline/build sync workers
What does this MR do and why?
Disables Sidekiq retries for ClickHouse pipeline/build sync workers to prevent duplicate data insertion.
Problem
When Net::ReadTimeout occurs during ClickHouse INSERT operations:
- ClickHouse receives and processes the INSERT successfully
- The HTTP response times out before reaching the client
- Worker throws exception before marking sync events as processed
- Events remain
processed: falsein PostgreSQL - On retry, the same events are sent to ClickHouse again → duplicates
The source table (ci_finished_pipelines) uses ReplacingMergeTree, which eventually deduplicates rows during background merges. However, the MV target tables (ci_finished_pipelines_daily, ci_finished_pipelines_hourly) use AggregatingMergeTree, which accumulates every INSERT without deduplication.
Retry amplification
Sidekiq's default maximum retry count is 25. If each retry experiences the same timeout pattern (successful INSERT followed by timeout), the same data could be inserted up to 25 times, causing massive inflation in the aggregation tables.
Solution
Setting retry: false prevents this amplification. The cron jobs run every 3 minutes, so any failed work will be picked up by the next scheduled run.
Also changes data_consistency to :sticky to ensure we read from the primary database and don't miss recently created sync events (required when disabling retries).
CSV size analysis
Measured the CSV upload size for a full batch of 5,000 pipelines:
| Metric | Value |
|---|---|
| Rows per upload | 5,000 |
| Uncompressed CSV size | ~936 KB |
| Gzipped size | ~169 KB |
| Bytes per row | ~192 bytes |
The CSV size is trivial and not the likely cause of timeouts. The Net::ReadTimeout is probably caused by:
- ClickHouse processing time with
wait_for_async_insert=1 - Network latency to ClickHouse Cloud
- The 20-second default
read_timeoutbeing too short under load
Related issues
- Part of #586319
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.