Instrumentation Audit across Core DevOps

Overview

This initiative aims to audit and assess our current instrumentation across the DevOps product portfolio to understand whether we have the right usage data in place. Quality engineering decisions should be data-driven - we need to know if areas with bugs/tech debt actually have user adoption, or if we should consider deprecation/removal instead of investment.

DRI: @jdrpereira DRI Responsibilities

Coordinate with DevOps teams to understand current instrumentation
Work cross-functionally to assess data availability
Document findings in standardized format for each team
Present weekly updates at Product Quality Standup
Enable final recommendations with usage-based prioritization

Business Context

As we scale to serve enterprise customers who rely on GitLab as their "Tier 0 platform," we must make strategic decisions about where to invest our engineering capacity. Understanding feature usage helps us:

Prioritize bug fixes in high-usage areas vs. burning down bugs against unused features
Make informed decisions about tech debt and maintenance vs. complete feature removal
Optimize engineering investment based on actual customer value
Support our path to $2B revenue by focusing on what customers actually use 🚀

Success Criteria

Complete instrumentation coverage audit across all DevOps stages
- This could be based on Product Category, Feature, or some other thing
Identify gaps in usage data collection
- What usage metrics are currently tracked?
- Is it the right data for the purpose?
- Can we access this data easily? Where can it be viewed?
- Is the existing data reliable and actionable?
Establish baseline for ongoing instrumentation health
Enable data-driven decisions about feature deprecation vs. investment

For each area with bugs/tech debt, we should be able to provide:

Usage assessment: High/Medium/Low/Unknown usage
Investment recommendation: Fix/Maintain/Deprecate/Remove
Instrumentation improvement plan: What to add/fix to improve decision-making

Work Plan

The following were the key decisions behind the rationale and structure of this plan:

Feature-level granularity: Audit individual features, which can then be rolled up to feature categories and stages as needed.
Pilot-first approach: Run Package audit first (the initiative DRI's own stage) to validate the framework, confirm timing estimates are realistic, and create a concrete example before scaling to other stages.
Distributed execution: Each stage provides their own DRI to conduct audits in parallel, removing bottlenecks and leveraging domain expertise. The initiative DRI provides framework and oversight to ensure consistency and reduce unconscious bias from self-evaluation.

Week 1: Setup & Kickoff

Week 2: Audit Framework

Create standardized audit template including:
- Feature-level inventory checklist
- Feature → Category → Stage rollup
- Current instrumentation status fields per feature
- Usage classification criteria (High/Medium/Low/Unknown thresholds)
- Data quality score (reliability, completeness, accessibility)

Week 3: Pilot Audit & Refinement

Conduct pilot audit on Package stage
Refine audit framework based on pilot feedback/results
Create a scoring system for investment recommendations per feature
Document step-by-step guide, use Package audit as example
Confirm feasibility of 2 weeks for audit execution on remaining stages based on Package pilot

Week 4-68: Audit Execution

Week 78-9: Analysis & Deliverables

Analysis

Stakeholder Review

Review findings with each stage EM/PM
Refine recommendations based on feedback

Report

Available at https://instrumentation-audit-054c5e.gitlab.io/.

Document feature-by-feature breakdown with category/stage rollups
Create executive summary with:
- Top feature deprecation candidates across stages
- Top feature investment priorities
- Category and stage level insights
Share findings with leadership

Handoff

Create specific action items for each team
Establish ongoing instrumentation health monitoring process
Handoff ownership to EMs/PMs for follow-through

Stage DRIs

Stage	DRI
Create	@psjakubowska @jwoodwardgl @adebayo_a
Plan	@pskorupa @fernanda.toledo
Verify:Pipelines	@furkanayhan grouppipeline authoring @allison.browne grouppipeline execution
Deploy	@timofurrer @tigerwnz
Runner	@ratchade, @avonbertoldi Category:Runner Core @pedropombeiro, @narendran-kannan Category:Fleet Visibility
Package	@jdrpereira

Edited Sep 04, 2025 by João Pereira