The Data Quality Program establishes comprehensive standards and procedures to enhance trust in GitLab's data assets, enabling accurate insights and effective decision-making while reducing manual data correction efforts. This program is led by the Data Governance & Quality team and covers all enterprise data domains.
### Program Objectives
-**Measure** data quality objectives across all dimensions
-**Track** progress against specific, measurable targets
-**Report** on program effectiveness at multiple levels
-**Improve** continuously through data-driven insights
## Data Quality Framework
### Six Dimensions of Data Quality
The GitLab Data Quality Program measures quality across six key dimensions:
|Dimension|Definition
|-----------|----------
|**Accuracy**|Data correctly represents real-world entities and values
|**Completeness**|All required data fields are populated
|**Consistency**|Data aligns across different systems and over time
|**Timeliness**|Data is available within expected timeframes
|**Validity**|Data conforms to defined formats and business rules
|**Uniqueness**|No inappropriate duplicate records exist
*Note: Specific target thresholds will be established through baseline measurements and domain-specific requirements.*
### Implementation Approach
The Data Quality Program will be implemented through close partnership with domain stakeholders and functional data stewards. Each domain's unique requirements and challenges will be addressed through:
- Collaborative baseline assessments with domain teams
- Co-development of domain-specific quality thresholds
- Joint ownership of improvement initiatives
- Regular touchpoints with domain stewards for continuous refinement
**Timeline:**
-**FY27**: Pilot implementation for Product domain to establish frameworks, processes, and best practices
-**FY27-28**: Expand program to additional domains based on pilot learnings and domain readiness
## Reporting and Managing Data Quality Issues
### When to Open a Data Quality Issue
Open a Data Quality issue when you discover:
-**Inaccurate Data** - Values that don't match reality (e.g., incorrect revenue amounts)
-**Missing Data** - NULL or empty fields that should exist (e.g., missing customer IDs)
-**Inconsistent Data** - Conflicting information across systems (e.g., different customer counts in Salesforce vs. Snowflake)
-**Untimely Data** - Outdated, stale, or delayed data updates (e.g., dashboards not refreshing)
-**Invalid Data** - Format violations or business rule breaches (e.g., future dates for historical events)
-**Duplicate Data** - Repeated records where uniqueness is expected (e.g., duplicate customer records)
### How to Report a Data Quality Issue
<details>
<summary><b>Step 1: Create the Data Quality Issue</b></summary>
1. Navigate to the [Analytics project](https://gitlab.com/gitlab-data/analytics/-/issues) in GitLab
2. Click "New Issue", select and apply the **[Report] Data Quality Issue**
3. If converting an existing issue, use `/label ~"Data Quality Issue"`
Select the appropriate severity level (Sev1-4) based on business impact as defined in the issue template.
**⚠️ For Sev1/Sev2 issues:** Immediately notify #data-team Slack channel and tag `@data-governance`
For more details on Incident management and Severity Levels, kindly refer the [Data Team Incident Management](/handbook/enterprise-data/data-governance/incident-management/) Handbook Page.
##### Problem Description
Provide comprehensive details in all required fields of the template, including technical evidence.
**Technical Evidence:**
Complete the evidence section in the issue template with relevant SQL queries, screenshots, and data samples.
##### Impact Assessment
Complete all impact fields in the template (Customer, ARR, Records, Strategic Impact).
##### Systems Information
Complete the systems and domain checkboxes provided in the issue template:
-**Primary Affected System** - Select all systems where the issue occurs
-**Data Domain Affected** - Identify which business domain is impacted
1.**Validate Severity** - Confirm it matches business impact
2.**Check for Duplicates** - Search for similar existing issues
3.**Apply DQ Dimension Label** - Use appropriate `DQ-[Dimension]` label
4.**Assign DRI** based on issue type
5.**Set Workflow State** - Move to appropriate stage
6.**Communicate** - Notify via Slack if Sev1/Sev2
</details>
### Data Quality Issue Management Workflow
**Detailed Workflow Diagram - Coming Soon**
A comprehensive workflow diagram detailing decision points, escalation paths, and automated triggers for data quality issue management is currently being developed and will be added to this handbook page.
*For current procedures, please follow the steps outlined in the sections above.*
### Root Cause Analysis
All resolved issues require root cause classification: