[Fleet Visibility] Backfill CI job performance materialized view
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
Backfill the ci_job_performance_daily_mv materialized view with historical CI job performance data.
Background
Part of the implementation for &18548. After creating the materialized view in #555974 (closed), we need to backfill it with historical data.
Blocked by #555974 (closed) - The materialized view must be created before we can backfill it.
Implementation Details
Backfill Query
The backfill should be executed using the following SQL:
INSERT INTO gitlab_clickhouse_development.ci_job_performance_daily_mv
SELECT
toDate(b.finished_at) AS date,
b.project_id,
b.stage_id,
b.name,
p.source,
p.ref,
quantilesState(0.5, 0.95)(b.duration) AS duration_quantiles,
countState() AS total_builds,
countStateIf(b.status = 'failed') AS failed_builds
FROM gitlab_clickhouse_development.ci_finished_builds b
INNER JOIN gitlab_clickhouse_development.ci_finished_pipelines p
ON b.pipeline_id = p.id
WHERE b.finished_at > 0
AND b.finished_at >= today() - INTERVAL 180 DAY
GROUP BY date, b.project_id, b.stage_id, b.name, p.source, p.ref;
Backfill Strategy
For large-scale deployments, consider backfilling in chunks by month to avoid overwhelming the system:
-- Example: Backfill one month at a time
INSERT INTO gitlab_clickhouse_development.ci_job_performance_daily_mv
SELECT
toDate(b.finished_at) AS date,
b.project_id,
b.stage_id,
b.name,
p.source,
p.ref,
quantilesState(0.5, 0.95)(b.duration) AS duration_quantiles,
countState() AS total_builds,
countStateIf(b.status = 'failed') AS failed_builds
FROM gitlab_clickhouse_development.ci_finished_builds b
INNER JOIN gitlab_clickhouse_development.ci_finished_pipelines p
ON b.pipeline_id = p.id
WHERE b.finished_at > 0
AND b.finished_at >= '2025-01-01'
AND b.finished_at < '2025-02-01'
GROUP BY date, b.project_id, b.stage_id, b.name, p.source, p.ref;
Considerations
- Time Range: Default to backfilling the last 180 days to support the largest time window in the UI
- Performance: Monitor ClickHouse cluster performance during backfill
- Chunking: For GitLab.com, may need to backfill in smaller chunks (e.g., by week or by project ranges)
- Verification: After backfill, verify data integrity with sample queries
Acceptance Criteria
-
Backfill script is created and tested -
Historical data for the last 180 days is loaded into the MV -
Performance impact during backfill is acceptable -
Data integrity is verified post-backfill
Verification Query
After backfill, verify the data with:
-- Check data distribution by date
SELECT
toStartOfMonth(date) AS month,
count() AS num_days,
sum(countMerge(total_builds)) AS total_builds
FROM gitlab_clickhouse_development.ci_job_performance_daily_mv
GROUP BY month
ORDER BY month DESC;
Note: This issue was created by Claude based on implementation discussion in epic &18548
Edited by 🤖 GitLab Bot 🤖