Sync stage_name to ClickHouse for job analytics grouping

Summary

Currently, the job analytics feature cannot properly group by stage because each pipeline run creates a new stage record in p_ci_stages, making every stage_id in ci_finished_builds unique. To enable proper stage-based grouping in job analytics, we need to sync stage_name to ClickHouse.

Problem Details

Proposed Solution

Sync the stage_name to ClickHouse and use it to group the data instead of stage_id.

Implementation Approaches Discussed

Option 1: Sync p_ci_stages table to ClickHouse

  • Leverage Syphon for syncing the new table
  • Only sync p_ci_stages records from the last 6 months
  • Sync newer p_ci_stages along with FinishedPipelineChSyncEvent sync
  • Include created_at column to allow dropping older stages from CH after 6 months if needed
  • Update query to use INNER JOIN with ci_stages table

Example query structure:

SELECT 
    s.stage_name,
    b.name, 
    avg(b.duration) / 1000.0 AS mean_duration_in_seconds, 
    quantile(0.95)(b.duration) / 1000.0 AS p95_duration_in_seconds, 
    countIf(b.status = 'success') / count() * 100 AS rate_of_success 
FROM ci_finished_builds b
INNER JOIN ci_stages s ON b.stage_id = s.id
WHERE 
    b.project_id = 278964 AND
    b.pipeline_id IN (
        SELECT p.id 
        FROM ci_finished_pipelines p 
        WHERE p.path = '9970/15846663/'
        AND p.finished_at >= today() - INTERVAL 30 DAY
        AND p.finished_at <= now()
    )
GROUP BY s.stage_name, b.name
ORDER BY mean_duration_in_seconds DESC, s.stage_name ASC, b.name ASC
LIMIT 20;

Option 2: Row versioning in ci_finished_builds (less preferred)

  • Add ci_finished_builds.inserted_at column for versioning
  • Insert duplicates with correct stage_name instead of updating
  • Change all read operations to only consider the record with latest inserted_at for each id
  • Loop through all ci_finished_builds records with finished_at in last 6 months
  • Note: Updates are resource-intensive in ClickHouse, and row versioning adds complexity
Edited by 🤖 GitLab Bot 🤖