Disable Sidekiq retries for ClickHouse pipeline/build sync workers

What does this MR do and why?

Disables Sidekiq retries for ClickHouse pipeline/build sync workers to prevent duplicate data insertion.

Problem

When Net::ReadTimeout occurs during ClickHouse INSERT operations:

  1. ClickHouse receives and processes the INSERT successfully
  2. The HTTP response times out before reaching the client
  3. Worker throws exception before marking sync events as processed
  4. Events remain processed: false in PostgreSQL
  5. On retry, the same events are sent to ClickHouse again → duplicates

The source table (ci_finished_pipelines) uses ReplacingMergeTree, which eventually deduplicates rows during background merges. However, the MV target tables (ci_finished_pipelines_daily, ci_finished_pipelines_hourly) use AggregatingMergeTree, which accumulates every INSERT without deduplication.

Retry amplification

Sidekiq's default maximum retry count is 25. If each retry experiences the same timeout pattern (successful INSERT followed by timeout), the same data could be inserted up to 25 times, causing massive inflation in the aggregation tables.

Solution

Setting retry: false prevents this amplification. The cron jobs run every 3 minutes, so any failed work will be picked up by the next scheduled run.

Also changes data_consistency to :sticky to ensure we read from the primary database and don't miss recently created sync events (required when disabling retries).

CSV size analysis

Measured the CSV upload size for a full batch of 5,000 pipelines:

Metric Value
Rows per upload 5,000
Uncompressed CSV size ~936 KB
Gzipped size ~169 KB
Bytes per row ~192 bytes

The CSV size is trivial and not the likely cause of timeouts. The Net::ReadTimeout is probably caused by:

  • ClickHouse processing time with wait_for_async_insert=1
  • Network latency to ClickHouse Cloud
  • The 20-second default read_timeout being too short under load
  • Part of #586319

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Pedro Pombeiro

Merge request reports

Loading