Fix Failure rate and Success rate denominator on CI/CD analytics page
What does this MR do and why?
Fixes the Failure rate and Success rate on the project CI/CD analytics page (<project>/-/pipelines/charts). Previously both rates used a denominator that included canceled and skipped, which silently diluted them — a real example from gitlab-org/gitlab showed Failure rate 7% + Success rate 91% = 98% (the missing 2% was the canceled/skipped slice absorbed into the denominator).
After this MR:
SUCCESSandFAILEDrates are computed againstsuccess + failed(conclusive outcomes only). The two rates always sum to ~100% on projects with noOTHERjobs/pipelines.OTHERrate (canceled/skipped) keeps the total denominator —"% of all runs that were canceled"only makes sense relative to the total. This preserves theCANCELED_RATE_*sort order semantics for external API consumers.- A
How this is calculated?info-o popover is added next to the Failure rate KPI and the Failure rate (%) column header in the Jobs panel so users see the formula in-context. UX approved here.
The fix lives in three layers:
- Backend (Jobs panel rates):
lib/click_house/finders/ci/concerns/finished_builds_aggregations.rb—build_rate_aggregatenow picks the right denominator per status. When bothsuccess + failed = 0, ClickHouse returnsNaNwhich serializes to JSONnulland the frontend renders-. - Frontend (Pipelines KPI strip):
pipelines_stats.vue— computessuccessCount + failedCountclient-side as the rate denominator (there is norateGraphQL field for pipeline analytics). - Frontend (Jobs panel tooltip):
job_analytics_table.vue— the per-cell tooltipN / Muses the samesuccess + faileddenominator so the displayed fraction matches the displayed rate.
The GraphQL CiJobAnalyticsStatistics.rate description is updated to document the new semantics.
References
- Closes #599923
- UX direction: #599923 (comment 3347483125, 3347717341)
Screenshots or screen recordings
| Before | After |
|---|---|
![]() |
![]() |
![]() |
![]() |
How to set up and validate locally
Paste the script below into rails console. Edit project_path to point at any non-empty project in your GDK (and optionally test_ref to pick a branch name to filter by on the page). The script creates pipelines + jobs covering every interesting rate case, syncs them to ClickHouse, and prints the URL to open.
Validation script
# ------------------------------------------------------------------------------
# Edit these before running
# ------------------------------------------------------------------------------
project_path = 'group/project' # any non-empty project
test_ref = "rate-fix-validation-#{Time.now.to_i}" # branch ref for filtering
# ------------------------------------------------------------------------------
# Helpers
# ------------------------------------------------------------------------------
section = ->(title) { puts; puts "=" * 80; puts title; puts "=" * 80 }
info = ->(message) { puts " #{message}" }
create_builds = lambda do |count:, status:, pipeline:, stage:, name:, base_time:|
Array.new(count) do |i|
FactoryBot.create(
:ci_build, status,
project: pipeline.project, pipeline: pipeline, ci_stage: stage, name: name,
started_at: base_time + i.seconds, finished_at: base_time + (i + 1).seconds
)
end
end
create_build_sync_events = lambda do |builds|
builds.each do |build|
next if build.finished_at.nil? # skipped builds never sync
Ci::FinishedBuildChSyncEvent.upsert(
{ build_id: build.id, project_id: build.project_id, build_finished_at: build.finished_at },
unique_by: [:build_id, :partition]
)
end
end
create_pipeline_sync_event = lambda do |pipeline|
next if pipeline.finished_at.nil?
Ci::FinishedPipelineChSyncEvent.upsert(
{
pipeline_id: pipeline.id,
pipeline_finished_at: pipeline.finished_at,
project_namespace_id: pipeline.project.project_namespace_id
},
unique_by: [:pipeline_id, :partition]
)
end
# ------------------------------------------------------------------------------
# 1. Setup
# ------------------------------------------------------------------------------
section.call "Setup"
project = Project.find_by_full_path(project_path) || raise("Project not found: #{project_path}")
info.call "Using project: #{project.full_path} (id=#{project.id})"
raise "ClickHouse is not configured." unless Gitlab::ClickHouse.configured?
settings = ::Gitlab::CurrentSettings.current_application_settings
unless settings.use_clickhouse_for_analytics?
info.call "Enabling use_clickhouse_for_analytics application setting"
settings.update!(use_clickhouse_for_analytics: true)
end
Namespace.all.flat_map(&:sync_events).each { |e| ::Ci::NamespaceMirror.sync!(e) }
# ------------------------------------------------------------------------------
# 2. Build out a pipeline with jobs covering each interesting rate case
# ------------------------------------------------------------------------------
section.call "Creating jobs to cover each rate case"
# Place data well inside the default 7-day analytics window. The page's
# toTime is UTC midnight of today, so anything from today is excluded.
base_time = 1.day.ago.utc
info.call "Using test ref: #{test_ref}"
common_attrs = {
project: project, ref: test_ref, source: :push,
committed_at: base_time - 2.minutes, started_at: base_time - 1.minute, duration: 60
}
successful_pipeline = FactoryBot.create(:ci_pipeline, :success, **common_attrs, finished_at: base_time)
failed_pipeline = FactoryBot.create(:ci_pipeline, :failed, **common_attrs, finished_at: base_time)
canceled_pipeline = FactoryBot.create(:ci_pipeline, :canceled, **common_attrs, finished_at: base_time)
build_stage = FactoryBot.create(:ci_stage, pipeline: successful_pipeline, project: project, name: 'build')
test_stage = FactoryBot.create(:ci_stage, pipeline: successful_pipeline, project: project, name: 'test')
cases = [
# name status count stage
['rate-100-success', :success, 5, build_stage], # 100% success, 0% fail
['rate-80-success', :success, 8, build_stage],
['rate-80-success', :failed, 2, build_stage], # paired -> 80/20
['rate-balanced', :success, 5, test_stage],
['rate-balanced', :failed, 5, test_stage],
['rate-balanced', :canceled, 2, test_stage], # ignored by new denominator
['rate-90-failed', :success, 1, test_stage],
['rate-90-failed', :failed, 9, test_stage],
['rate-100-failed', :failed, 5, build_stage], # 0% success, 100% fail
['only-canceled-job', :canceled, 5, test_stage], # both rates -> '-'
['canceled-heavy-job', :success, 1, test_stage],
['canceled-heavy-job', :canceled, 8, test_stage] # old: ~11% success, new: 100%
]
builds = []
cases.each do |(name, status, count, stage)|
pipeline = case status
when :success then successful_pipeline
when :failed then failed_pipeline
when :canceled then canceled_pipeline
end
created = create_builds.call(
count: count, status: status, pipeline: pipeline, stage: stage, name: name, base_time: base_time
)
builds.concat(created)
info.call " #{name.ljust(20)} x#{count.to_s.rjust(2)} (#{status})"
end
# Skipped build to demonstrate the producer-side gap (no finished_at, no sync event).
builds << FactoryBot.create(
:ci_build, :skipped,
project: project, pipeline: successful_pipeline, ci_stage: test_stage, name: 'skipped-never-syncs'
)
info.call " skipped-never-syncs x 1 (skipped) - should NOT appear in Jobs panel"
# ------------------------------------------------------------------------------
# 3. Sync to ClickHouse
# ------------------------------------------------------------------------------
section.call "Syncing to ClickHouse"
info.call "Creating build sync events for #{builds.count(&:finished_at)} finished builds"
create_build_sync_events.call(builds)
info.call "Creating pipeline sync events"
[successful_pipeline, failed_pipeline, canceled_pipeline].each { |p| create_pipeline_sync_event.call(p) }
info.call "Running ClickHouse::DataIngestion::CiFinishedBuildsSyncService"
build_result = ClickHouse::DataIngestion::CiFinishedBuildsSyncService.new.execute
info.call " -> #{build_result.payload.except(:worker_index, :total_workers, :mode).inspect}"
info.call "Running Ci::ClickHouse::DataIngestion::FinishedPipelinesSyncService"
pipeline_result = Ci::ClickHouse::DataIngestion::FinishedPipelinesSyncService.new.execute
info.call " -> #{pipeline_result.payload.except(:worker_index, :total_workers, :mode).inspect}"
# ------------------------------------------------------------------------------
# 4. Print expected values and the URL to open
# ------------------------------------------------------------------------------
section.call "Expected values on the page"
puts <<~EXPECTED
KPI strip (Pipelines):
Total pipeline runs: 3 (success + failed + canceled)
Failure rate: 50% (1 failed / (1 success + 1 failed); canceled excluded)
Success rate: 50%
-> info-o popover next to "Failure rate" with the formula tooltip.
-> The two rates should sum to ~100% (was 33% + 33% = 67% under the old bug).
Jobs panel:
rate-100-success 5/5 success Failure rate 0% Success rate 100%
rate-80-success 8/10 success Failure rate 20% Success rate 80%
rate-balanced 5/10 success Failure rate 50% Success rate 50%
(the 2 canceled are excluded from the denominator;
old bug would have shown 41.67% / 41.67%)
rate-90-failed 1/10 success Failure rate 90% Success rate 10%
rate-100-failed 0/5 success Failure rate 100% Success rate 0%
only-canceled-job - Failure rate - Success rate -
(0 success + 0 failed -> NaN -> null -> '-')
canceled-heavy-job 1/1 success Failure rate 0% Success rate 100%
(old bug would have shown ~0% / ~11%)
skipped-never-syncs NOT PRESENT (skipped builds are not synced)
-> info-o popover next to "Failure rate (%)" column header with formula.
-> Hover each rate cell: tooltip shows "<count> / (success+failed)" matching
the displayed rate, not "<count> / total".
EXPECTED
section.call "Open this URL to verify"
puts "#{Gitlab.config.gitlab.url}/#{project.full_path}/-/pipelines/charts?branch=#{test_ref}"
puts " (URL includes ?branch=#{test_ref} so the page filters to just this test data.)"
# ------------------------------------------------------------------------------
# 5. Cleanup
# ------------------------------------------------------------------------------
section.call "Cleaning up"
[successful_pipeline, failed_pipeline, canceled_pipeline].each(&:destroy!)
info.call "Pipelines destroyed. Note: ClickHouse rows are not removed by this script."
puts
puts "Done."MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.



