Add CI/CD job metrics to the CI/CD Analytics View for projects (Limited Availability) (#18548) · Epics · GitLab.org

Add CI/CD job metrics to the CI/CD Analytics View for projects (Limited Availability)

<details> <summary> ## PMM Summary content: </summary> **Feature title**: Job Performance Metrics in CI/CD Analytics **Value Proposition**: Identify and fix slow or failing CI/CD jobs faster—without leaving GitLab—to improve developer velocity and reduce pipeline troubleshooting time **Primary Target Audience:** - Platform engineers managing enterprise CI/CD - Development teams with long-running pipelines or high pipeline failure rates **Marketing Channels:** - Release Post (must-have) - Blog Post (optional) - Other (tbd) **Key Benefits:** Development Teams: - Quickly pinpoint which specific jobs are slowing down pipelines (P50/P95 duration visibility) - Identify flaky or problematic jobs through failure rate tracking - Eliminate guesswork when optimizing 3-4 hour pipeline runtimes down to manageable durations - Make data-driven decisions on which jobs to optimize first (sorted by duration by default) Platform Engineers/Admins - Reduce dependency on external observability tools or custom-built solutions - Get unified view of job performance alongside pipeline metrics in one place - Operate GitLab at enterprise scale with built-in CI/CD observability - Search and sort by job name, stage, duration, or failure **Release Post Content Draft** Job Performance Metrics in CI/CD Analytics GitLab now displays job-level performance metrics directly in your CI/CD Analytics view, giving platform engineers and development teams the visibility they need to identify and fix slow or failing jobs without leaving GitLab. See at a glance which specific jobs are slowing down your pipelines with P50 and P95 duration metrics, track failure rates to identify flaky jobs, and make data-driven decisions on where to focus optimization efforts. The new job metrics table is sortable and searchable, with jobs sorted by duration by default so you can immediately spot your biggest bottlenecks. No more building custom observability solutions or juggling external tools—everything you need to optimize your 3-4 hour pipelines down to manageable runtimes is now unified in one place. Available in Limited Availability for Premium and Ultimate tiers, this feature helps teams operating GitLab at enterprise scale reduce pipeline troubleshooting time and improve developer velocity. </details>  ## Status (2026-02-11) CI/CD job performance metrics are now live on GitLab.com as of **18.8**. See https://docs.gitlab.com/user/analytics/ci_cd_analytics/#cicd-job-performance-metrics for the metrics available. ## What's Coming Next: Grouping by Stage https://gitlab.com/groups/gitlab-org/-/work_items/21196 ### Feature summary Display of CI/CD job metrics for each job in the pipeline in the CI/CD analytics page. The default time scale for the metrics = last 30 days. The metrics to display for each job are: - Job name - Stage - P50 duration - P95 duration - Failure rate ## GitLab Tier - Premium - Ultimate ## Problem <details> <summary>Problem validation summary</summary> ## Problem validation summary: **Admins/platform engineers want to see job and pipeline metrics as well as runner metrics in the same view:** When thinking about how to optimize pipelines, runners is a major aspect, but so is the way that the pipeline is configured and even the way the repo is set up. Users need a single place where they can make effective decisions on how to optimize CI/CD, which specifically means including more extensive metrics within the pipeline and job space. ### Example customer problem to solve: Current state - Rather long Pipeline execution time (3.5 to 4 hours) for the whole pipeline - This is being run as a nightly build - Already includes optimizations on matrix builds - For feature branches: “short” pipeline with about 1h runtime, but doesn’t test everything Challenges: - They need to run their pipelines on machines with real-time kernels with custom patches (on-premise hardware stack). - Want to optimize the pipeline execution and want some data to be more efficient, including: - Which part of the pipeline takes how much time? - How often do they fail, and why? (they have flaky pipeline jobs) - How long does certain infrastructure-related bit take? Workaround solution - Built observability-like features themselves - Would like to see more/better data/visibility inside GitLab - They found an issue on a specific runner machine using that self-built tool ![Screenshot_2025-04-09_at_11.19.02_AM](https://gitlab.com/-/project/278964/uploads/b4ffeb95af443c73fd4a04c61a83f895/Screenshot_2025-04-09_at_11.19.02_AM.png) </details> ## Proposal * Add a new panel to the dashboard with a table of job metrics with the following column: * job name * stage it belongs to * mean duration * p95 duration * failure rate * Pagination should be shown in the panel after 10 items * Each column should be sortable * By default, the table **should be sorted by mean duration** * The user should be able to use the search bar in the panel to **search** **for** a **job name** :art: [Design in design management](https://gitlab.com/gitlab-org/gitlab/-/issues/453956/designs/job-analytics.png) :paintbrush: [Figma file](https://www.figma.com/design/ZjFKeXSfmG2KrWPeb47Slt/Fleet-Visibility-Metrics?node-id=1897-7485&t=NOl19pOh1CeIhv16-1) ### Out of scope and needs to be explored in follow-up issues The following came from the feedback phase of this issue, and _requires validation before implementing_ :point_down: - Job duration trends (for example, the duration has been trending up in the past week) - Specific metrics on test jobs - Filtering the job panel by other data like stage or job tag(s) - Viewing metrics related to a single job in the table (see [exploration here](https://gitlab.com/gitlab-org/gitlab/-/issues/453956/designs/job-analytics-2nd-level.png)) - Viewing a breakdown of failures that make up the failure rate of a pipeline or job - Actions to take based on the data (for example, an action to update the ci.yaml file for a job's configuration to improve duration) - Visually indicating anomalies in the metrics (for example, a job's failure rate is 7% when it is usually 1%) - Visually indicating metrics that fall above a certain threshold set (anything above 5% failure or 1 min duration) as well as allowing users to add custom thresholds to be notified for - Predicting metrics based on historical data - Using AI to automatically optimize job or pipeline performance to result in improved speed, status, or cost ## Implementation plan We'll need to build a materialized view that aggregates the required job statistics by job name: ```sql -- Create the MV with the new name CREATE MATERIALIZED VIEW gitlab_clickhouse_development.ci_job_performance_daily_mv ENGINE = AggregatingMergeTree() PARTITION BY toYYYYMM(date) ORDER BY (project_id, source, ref, name, stage_id, date) AS SELECT toDate(b.finished_at) AS date, b.project_id, b.stage_id, b.name, p.source, p.ref, quantilesState(0.5, 0.95)(b.duration) AS duration_quantiles, countState() AS total_builds, countStateIf(b.status = 'failed') AS failed_builds FROM gitlab_clickhouse_development.ci_finished_builds b INNER JOIN gitlab_clickhouse_development.ci_finished_pipelines p ON b.pipeline_id = p.id WHERE b.finished_at > 0 -- Ensure we have valid finished times GROUP BY date, b.project_id, b.stage_id, b.name, p.source, p.ref; ``` We can then backfill the last 180 days with: ```sql INSERT INTO gitlab_clickhouse_development.ci_job_performance_daily_mv SELECT toDate(b.finished_at) AS date, b.project_id, b.stage_id, b.name, p.source, p.ref, quantilesState(0.5, 0.95)(b.duration) AS duration_quantiles, countState() AS total_builds, countStateIf(b.status = 'failed') AS failed_builds FROM gitlab_clickhouse_development.ci_finished_builds b INNER JOIN gitlab_clickhouse_development.ci_finished_pipelines p ON b.pipeline_id = p.id WHERE b.finished_at > 0 AND b.finished_at >= today() - INTERVAL 180 DAY GROUP BY date, b.project_id, b.stage_id, b.name, p.source, p.ref; ``` The query to populate the table would look like the following: ```sql -- Last 30 days, filtered by source and ref SELECT project_id, stage_id, name, countMerge(total_builds) AS total_builds, countMerge(failed_builds) AS failed_builds, quantilesMerge(0.5, 0.95)(duration_quantiles) AS duration_percentiles, duration_percentiles[1] AS p50_duration, duration_percentiles[2] AS p95_duration, if(total_builds > 0, failed_builds / total_builds, 0) AS failure_rate FROM gitlab_clickhouse_development.ci_job_performance_daily_mv WHERE date >= today() - INTERVAL 30 DAY AND project_id = ? AND source = ? -- Filter by source AND ref = ? -- Filter by ref GROUP BY project_id, stage_id, name ORDER BY p50_duration DESC; ```  _This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc._   _This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc._

epic