Add Defect Escape Rate metric documentation (1d0a116e) · Commits · GitLab.com / Content Sites / handbook

content/handbook/engineering/infrastructure-platforms/developer-experience/development-analytics/_index.md

+117 −0

Original line number	Diff line number	Diff line
		@@ -114,6 +114,123 @@ flowchart LR

		Note: Access to these dashboards requires appropriate permissions. Contact team leads for access requests.

		## Metrics

		The Development Analytics group develops and maintains metrics to measure engineering productivity, quality, and efficiency. Each metric below is documented with its definition, methodology, current status, and known limitations.

		### Defect Escape Rate

		#### Current Status

		- Maturity: Alpha
		- Updated: Monthly (manual data collection for E2E environments)
		- Dashboard: [Defect Escape Rate (Snowflake)](https://app.snowflake.com/ys68254/gitlab/#/dx-defect-escape-rate-dM9ZOyVDJ)

		#### What & Why

		Defect Escape Rate measures the percentage of defects that escape to production compared to those caught by automated pipelines and tests across the software development lifecycle. This metric informs the effectiveness of our testing strategy and shift-left practices. A lower rate indicates stronger quality gates preventing defects from reaching customers.

		The metric supports drill-down by product group, enabling groups to track their own defect detection effectiveness.

		#### How It Works

		We measure "defects" in two ways:

		- Defects that escaped: Production bugs (issues with `type::bug` label)
		- Defects caught: Failed pipelines/tests that prevented problematic code from reaching production

		The formula calculates what percentage of total defects made it to production:

		```plaintext
		Defect Escape Rate = Defects Escaped / (Defects Escaped + Defects Caught)
		```

		What We Count as "Defects Escaped":

		- `type::bug` issues from the `gitlab-org/gitlab` project (canonical scope)
		- Or `type::bug` issues from the `gitlab-org` and `gitlab-com` groups (broad scope)

		What We Count as "Defects Caught":

		We use failed pipelines as a proxy for caught defects, assuming pipeline failures prevented problematic code from progressing further.

		Counted across these SDLC stages:

		1. MR pipelines - Failed pipelines in `gitlab-org/gitlab` and `gitlab-org/gitlab-foss`
		2. Master pipelines - Failed pipelines on the master branch
		3. Deployment E2E tests - Failed E2E test pipelines running against deployment environments:
		- Staging Canary, Staging Ref, Production Canary, Staging, Production, Preprod, Release (from ops.gitlab.net)
		- Dedicated UAT (from gitlab.com)

		Note: E2E metrics track failed test pipelines that validate each environment, not failures from deployment pipelines themselves. These serve as quality gates before customer impact.

		For `gitlab-foss`: Only direct failures (push, schedule, merge_request_event sources) are counted. Downstream pipelines (source = `pipeline` or `parent_pipeline`) are excluded to avoid double-counting failures already captured in parent `gitlab-org/gitlab` pipelines.

		Important Context on Measurement Precision:

		The current implementation uses "failed pipeline" as a proxy for "defect caught," which includes all pipeline failures (infrastructure issues, timeouts, linting errors, etc.), not just test failures indicating functional defects. This broad definition results in Defect Escape Rate values around 5-10%.

		Future iterations measuring only test failures (functional defects) will likely show Defect Escape Rate values around 20-40%. This increase reflects that many pipeline failures catch non-functional issues (infrastructure, configuration) rather than code defects that would affect customers. The higher percentage doesn't indicate worse quality - it shows a more precise measurement of test effectiveness at catching functional defects.

		Group-Level Defect Escape Rate:

		Defect Escape Rate can be filtered by product group using MR `group::` labels. The underlying assumption is that engineers from a given group primarily generate defects in code they're responsible for - defects their test suite should catch.

		Specifically:

		- Bugs assigned to groups via `group::` labels on issues
		- MR pipeline failures assigned to groups via `group::` labels on merge requests
		- Only MR pipeline failures can be attributed (we don't have `group::` labels on Master pipelines or E2E test pipelines)

		MRs and issues don't always have group labels set (e.g., 13% of MRs and 6% of issues in Oct-Dec 2025 lacked group labels).

		Future iterations would ideally use test ownership (`feature_category`) for attribution, providing direct measurement of which tests failed rather than inferring ownership from MR authorship. This requires adding group ownership data for all test frameworks, not just backend tests.

		#### Known Limitations

		Data Collection:

		- E2E pipeline failures retrieved manually via ops.gitlab.net API (not automated)
		- ops.gitlab.net pipeline data not available in ClickHouse or current Snowflake (legacy data stopped Aug 2025)
		- ClickHouse is our platform of choice for this metric, but we currently lack most required data (issues, merge requests, E2E pipelines). We plan to add this data in Q1 2026.

		Global Defect Escape Rate Limitations:

		- Current version counts all pipeline failures (infrastructure, timeouts, linting) not just functional test failures
		- More precise test-only measurement would ideally be implemented in ClickHouse once data is available

		Group Attribution Limitations:

		- Group-level Defect Escape Rate only includes MR pipeline failures (Master/E2E failures cannot be attributed without group labels on those pipelines)
		- Group Defect Escape Rate percentages will be higher than global Defect Escape Rate due to smaller denominator (MR-only vs. all SDLC stages)
		- MR label attribution assumes engineers primarily create defects in their own code areas - may not hold for cross-functional work
		- MRs and issues don't always have group labels

		Metric Variability:

		Defect Escape Rate is inherently variable and can be influenced by factors unrelated to actual quality improvements:

		- Master-broken incidents temporarily inflate "defects caught" (master failures spike), artificially lowering Defect Escape Rate
		- Infrastructure issues causing pipeline failures inflate denominator, lowering Defect Escape Rate without reflecting better testing
		- Flaky tests causing spurious failures inflate "defects caught," creating false appearance of improvement
		- CI capacity constraints may reduce pipeline execution, potentially masking defects

		Until we can filter these confounding factors, month-to-month Defect Escape Rate changes should be interpreted cautiously. Sustained trends over multiple months are more meaningful than single-month variations.

		#### Planned Improvements

		Q1 2026:

		- Automate E2E pipeline data ingestion from ops.gitlab.net to ClickHouse
		- Add issue and merge request data to ClickHouse for full automation
		- Construct the dashboard in ClickHouse
		- Refine "defects caught" to count only pipelines that failed due to RSpec or Jest test failures (note: still includes flaky tests and master-broken incidents)

		Future:

		- Filter out infrastructure failures, flaky tests, and master-broken incidents for cleaner measurement
		- Expand test ownership data (`feature_category`) to enable accurate group attribution based on which tests failed

		## How we work

		### Philosophy