Add projections to ci_finished_builds, ci_finished_pipelines ch table
What does this MR do and why?
- Add projection
build_stats_by_project_pipeline_name_stagetoci_finished_buildsch table - Add projection
by_path_source_ref_finished_attoci_finished_pipelinesch table
Why this change?
- Draft: GraphQL API for Project job Analytics (!201267 - closed) - Query in this MR takes a minimum of 3-5 seconds in production
- This is because the table has no necessary indexes, projections or materialized views.
- These projections are balanced to support all the queries we support and are going to support in the above MR.
References
Add jobAnalytics GraphQL API to fetch data on o... (#541703 - closed)
How to set up and validate locally
- Run this to execute the migration
bundle exec rake gitlab:clickhouse:migrate
- Clickhouse add this projection async. So, this migration runs faster
- To validate if the job is running, run this in clickhouse console
SELECT *
FROM system.mutations
WHERE database = 'gitlab_clickhouse_development'
AND table IN ('ci_finished_builds', 'ci_finished_pipelines')
AND NOT is_done;
- Depending on the data size, it may take a while. (Took more than 1 hour with 1B records in local)
- Once the above job is done in parts. Run these queries to check the performance
-- 1. WITH projection (default - should be fast)
SELECT /* with_projection */
b.stage_id,
b.name,
avg(b.duration) / 1000.0 AS mean_duration_in_seconds,
quantile(0.95)(b.duration) / 1000.0 AS p95_duration_in_seconds,
countIf(b.status = 'success') / count() * 100 AS rate_of_success
FROM ci_finished_builds b
WHERE project_id = 278964
GROUP BY b.stage_id, b.name
ORDER BY rate_of_success DESC
LIMIT 10
SETTINGS optimize_use_projections = 1;
-- Explicitly enable
-- 2. WITHOUT projection (should be slow)
SELECT /* without_projection */
b.stage_id,
b.name,
avg(b.duration) / 1000.0 AS mean_duration_in_seconds,
quantile(0.95)(b.duration) / 1000.0 AS p95_duration_in_seconds,
countIf(b.status = 'success') / count() * 100 AS rate_of_success
FROM ci_finished_builds b
WHERE project_id = 278964
GROUP BY b.stage_id, b.name
ORDER BY rate_of_success DESC
LIMIT 10
SETTINGS optimize_use_projections = 0;
-- Disable projections
-- 3. Check the performance metrics
SELECT query,
event_time_microseconds,
query_start_time_microseconds,
query_duration_ms,
read_rows,
read_bytes,
memory_usage,
partitions,
peak_threads_usage
FROM system.query_log
WHERE type = 'QueryFinish'
AND (query LIKE '%with_projection%' OR query LIKE '%without_projection%')
AND query LIKE '%278964%'
AND event_time > now() - INTERVAL 2 MINUTE
ORDER BY event_time DESC
LIMIT 2;
- Observed results with 1B records.
| Metric | Without Projection | With Projection | Improvement |
|---|---|---|---|
| Query Duration | 1455 ms | 485 ms | 3x faster |
| Rows Read | 260,736,511 (260M) | 35,322 (35K) | 7,380x less |
| Bytes Read | 11.6 GB | 7.75 MB | 1,500x less |
| Memory Usage | 252 MB | 945 MB | Higher |
- Query Plan
EXPLAIN
SELECT b.stage_id,
b.name,
avg(b.duration) / 1000.0 AS mean_duration_in_seconds,
quantile(0.95)(b.duration) / 1000.0 AS p95_duration_in_seconds,
countIf(b.status = 'success') / count() * 100 AS rate_of_success
FROM ci_finished_builds b
WHERE project_id = 278964
GROUP BY b.stage_id, b.name
ORDER BY rate_of_success DESC
LIMIT 10
-- Result
Expression
((Project names + (Before ORDER BY + Projection) [lifted up part]))
Limit (preliminary LIMIT (without OFFSET))
Sorting
(Sorting for ORDER BY)
Expression
((Before ORDER BY + Projection))
Aggregating
Filter
ReadFromMergeTree (proj_by_name_stage)
Uncovered and needs to be discussed (and will be taken in next MRs):
- Search by job name
- Can we create an index like this?
ALTER TABLE ci_finished_builds ADD INDEX name_lowercase(lower(name)) TYPE text; ALTER TABLE ci_finished_builds MATERIALIZE INDEX name_lowercase;
- Can we create an index like this?
- Filter by pipeline attrs
- We don't have project_id in ci_finished_pipelines (so it creates search space huge)
- But, we have project_path, need to check if that helps
- Can't use hourly, daily MVs as those are aggregations and won't have pipeline id
- Once this MR is shipped, we can choose any one of the approaches below:
- Use join
- Use
IN- Sample Query:
SELECT b.stage_id,
b.name,
avg(b.duration) / 1000.0 AS mean_duration_in_seconds,
quantile(0.95)(b.duration) / 1000.0 AS p95_duration_in_seconds,
countIf(b.status = 'success') / count() * 100 AS rate_of_success
FROM ci_finished_builds b
WHERE project_id = 278964
AND pipeline_id in (SELECT id from ci_finished_pipelines where source = 'api' and ref = 'main')
GROUP BY b.stage_id, b.name
ORDER BY rate_of_success DESC
LIMIT 10
SETTINGS optimize_use_projections = 1
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Edited by Narendran