Upstream data quality checks
## Executive Summary/Description
As the team responsible for the foundational instrumentation layer, data quality issues that reach customers or internal teams sometimes trace back to how events are initially captured and defined. Poor data quality undermines trust in our entire analytics ecosystem and creates support overhead when teams question their insights.
We sit at the critical first mile of the data pipeline. Issues we don't catch at instrumentation time become exponentially more expensive to fix downstream and can impact thousands of events across multiple teams and products.
Related epic: https://gitlab.com/groups/gitlab-org/-/epics/18243+s
#### Engineering Assessment
Currently, there are multiple steps to generating analytics data::
* What needs to be instrumented?
* How is it instrumented?
* How should the data look like?
* How is the data transformed?
* How is the data visualized/consumed?
The ownership of each of these steps is spread across multiple teams within Gitlab, (the feature teams, Analytics Instrumentation, DE, AE and BU, etc) Further more, the infrastructure that enables the collection, transformation and visualisation of this data is also constantly evolving, while also being adapted to support other usecases like Usage billing and Value stream analytics.
The dynamic, ever evolving and time consuming nature of this data insights process means that a product gap, lack of alignment, or a bug in any of these steps could result in a data quality issue.
The purpose of this epic is to
* Identify and define data quality SLOs and error budgets for each step in the data insights lifecycle
* Implement quality check mechanisms for each step
* Make infrastructure and process and tooling improvements to minimize the overall feedback time for data quality issues.
#### Dependencies
- Team dependencies: Data Governance team, Product data insights team
- Epic/Issue dependencies - https://gitlab.com/groups/gitlab-data/-/epics/1303+s
- External dependencies: \[Any external dependencies\]
#### DRIs
- **PM**: @tjayaramaraju
<!--also add as assignee to this epic-->
- **EM**: @abilgi
<!--also add as assignee to this epic-->
- **UX/PDM**: \[Name\]
<!--also add as assignee to this epic-->
- **Group(s)**: \[Group name(s)\]
<!--also add as label-->
- **Engineering Owner**: \[Stage level EM\]
#### Initiative Driver - Product or Engineering?
- [x] **Product-driven initiatives (P1/P2/P3)** - Customer-facing features or improvements driven by Product teams that require engineering resources and commitment
- These initiatives require a Product Priority label (P1/P2/P3)
- They may also receive GTM tier labels (T1/T2/T3) for external communication
- [ ] **Engineering-driven initiatives (E1/E2/E3)** - Internal technical improvements that may not have customer-facing components
- These initiatives require an Engineering Priority label (E1/E2/E3)
- They have internal visibility only and are not externally communicated
- Examples include: technical debt reduction, infrastructure improvements, refactoring, dependency upgrades
#### Sizing and Funding (Optional)
- **Size**: [XS/S/M/L/XL]
- **Funding Status**: [Funded/Partially funded/Not funded]
---
### Hygiene Guidelines
:bulb: _See additional details about this process at https://handbook.gitlab.com/handbook/product-development/r-and-d-interlock/
##### :one: Pre-Interlock
- [ ] Update epic description with all relevant information
- [ ] Ensure all dependencies are identified
- [ ] Apply appropriate labels (see below)
- [ ] Apply target delivery Milestone
- [ ] Update interlock status as discussions progress (via label)
##### :two: Post-Interlock: once quarter begins
- Update health status weekly (via label)
- Document any newly identified risks or dependencies
- Link to implementation epics/issues as work begins
- Flag any scope or timeline changes immediately
<!--Apply appropriate labels:
- [ ] Section (section::dev, section::ops, section::sec)
- [ ] Stage (devops::plan, devops::create, devops::verify, etc.)
- [ ] Group (group::product planning, group::project management, etc.)
- [ ] Interlock Priority (Product labels = Interlock Priority::P1, Interlock Priority::P2, Interlock Priority::P3, Engineering labels = Interlock Priority::E1, Interlock Priority::E2, Interlock Priority::E3)
- [ ] Investment theme (Investment theme::Core-Devops, Investment theme::Security-Compliance, Investment theme::AI across SDLC)
- [ ] Platforms (platform: GitLab.com, platform: dedicated, platform: dedicated for gov, platform: self-managed)
- [ ] Subscription tier (GitLab Ultimate, GitLab Premium, GitLab Free)
- [ ] Quarter (FY27 Q1, FY27 Q2, FY27 Q3, FY27 Q4)
- [ ] Pre-interlock status label (interlock status::New/Proposal in progress, interlock status::cancelled, etc)
- [ ] Post-interlock status label (R&D roadmap status::Executing, R&D roadmap status::Completed)
- [ ] Post-interlock, once quarter begins update health weekly (health::on track, health::needs attention, health::at risk)
*For guidance on labels, see the [labels guide here](https://handbook.gitlab.com/handbook/product-development/r-and-d-interlock/#labels-guide)-->
epic