Skip to content

[Fleet Visibility] Create ClickHouse materialized view for CI job performance metrics

Summary

Create a ClickHouse materialized view to aggregate CI job performance metrics including P50/P95 durations and failure rates.

Background

Part of the implementation for &18548 to add CI/CD job metrics to the CI/CD Analytics View.

Implementation Details

Create the following materialized view:

CREATE MATERIALIZED VIEW gitlab_clickhouse_development.ci_job_performance_daily_mv
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (project_id, source, ref, name, stage_id, date)
AS
SELECT
    toDate(b.finished_at) AS date,
    b.project_id,
    b.stage_id,
    b.name,
    p.source,
    p.ref,
    quantilesState(0.5, 0.95)(b.duration) AS duration_quantiles,
    countState() AS total_builds,
    countStateIf(b.status = 'failed') AS failed_builds
FROM gitlab_clickhouse_development.ci_finished_builds b
INNER JOIN gitlab_clickhouse_development.ci_finished_pipelines p 
    ON b.pipeline_id = p.id
WHERE b.finished_at > 0  -- Ensure we have valid finished times
GROUP BY date, b.project_id, b.stage_id, b.name, p.source, p.ref;

Key Design Decisions

  • Daily granularity for flexible time window queries (7 days, 30 days, 90 days, 180 days)
  • AggregatingMergeTree engine to properly handle State functions
  • Partitioned by month for efficient data management and potential TTL
  • ORDER BY optimized for filtering by project, source, ref, and searching by job name
  • Pre-aggregated percentiles using ClickHouse's quantilesState for accuracy
  • Includes source and ref from ci_finished_pipelines table via JOIN for filtering

Acceptance Criteria

  • Materialized view is created in production ClickHouse
  • View supports queries for different time windows
  • Performance is acceptable for typical query patterns
  • Migration is reversible

Testing

Test query to validate the MV:

SELECT 
    project_id,
    stage_id,
    name,
    source,
    ref,
    countMerge(total_builds) AS total_builds,
    countMerge(failed_builds) AS failed_builds,
    quantilesMerge(0.5, 0.95)(duration_quantiles) AS duration_percentiles,
    duration_percentiles[1] AS p50_duration,
    duration_percentiles[2] AS p95_duration,
    if(total_builds > 0, failed_builds / total_builds, 0) AS failure_rate
FROM gitlab_clickhouse_development.ci_job_performance_daily_mv
WHERE date >= today() - INTERVAL 30 DAY
    AND project_id = 19
GROUP BY project_id, stage_id, name, source, ref
ORDER BY p50_duration DESC;

Note: This issue was created by Claude based on implementation discussion in epic &18548

Edited by 🤖 GitLab Bot 🤖