FY25 Roadmap - Test and Tools Infrastructure

Welcome to Test and Tools Infrastructure team's FY25 roadmap. We remain open to your feedback and are prepared to adjust our plans accordingly.

Track: Test Tooling Dashboard & Monitoring

Track : Test Infrastructure Efficiency & Foundation

Problems to solve

Failures in tests targeting stages like staging canary (gstg-cny) and production (gprd) have significantly hindered deployments. From 2023 - Deployments metrics review, 59% of deployment blockers, totalling 190 incidents, were attributed to flaky tests, with 112 identified as flaky test root causes.
The effort required to maintain tests has escalated, with a notable increase in pipeline triage tasks.
Historical data shows that test flakiness has led to critical defects slipping through to production, impacting user experience and requiring substantial resources for post-release patches and fixes.
The ongoing issues with test reliability and deployment blockages have led to a degraded developer experience, characterised by delays, increased workload for debugging and fixing, and challenges in maintaining consistent development momentum.
Elevated feedback time, some MR pipelines can take ~> 1hr 10m.The prolonged time required to build Omnibus docker images in the GitLab-QA orchestrator is a critical bottleneck, leading to delays in testing and development cycles.There's a significant opportunity to enhance the efficiency of cloud-native build pipelines.
Despite having a suite of reliable orchestrated tests, our current setup does not leverage them to block MRs, leading to a significant gap.

Track: AI Initiative Test Tooling

Problems to solve

Performance related incidents leading to outages (e.g. gitlab-com/gl-infra/production#14468 (closed))
Uncertainty about the effectiveness of changes required to improve the quality of AI-generated responses and increase user satisfaction
Uncertainty about the risk of regressions, particularly due to complex interactions between system components
Inefficiencies in deploying/updating/managing test environments, especially when that involves system components like the AI gateway that are unfamiliar to most engineers

Track: Product Analytics

Problems to solve

For CustomersDot, automated tests still allow bugs to reach production due to a 4-hour window between staging and production, despite being caught in staging.
- There's no system for early test feedback in Merge Requests (MRs), limiting engineers' ability to verify changes with E2E tests before deployment.
- Risks of making breaking changes and low confidence when making changes to Product Analytics.Testing/deployment pipelines are incomplete for Product Analytics. Some parts of Product Analytics that are developed within Gitlab monolith like frontend, product analytics settings in Gitlab and GraphQL APIs that can be tested and deployed within Gitlab monolith’s CI/CD.
⌛ Product Analytics team is enabled to add e2e tests ( &42)
Product Analytics has a setup for cube schemas testing
⌛ Product Analytics has solid pipelines for testing and deployment ( &43)
Introduce contract testing for Product Analytics SDKs
Mechanisms to run and evaluate load testing for Product Analytics
Create a dashboard representing test metrics (e.g. Amount of tests run, tests passed/failed, current code coverage, etc. |

Track: Process Improvement / Housekeeping

practice that emphasizes the organization, cleanliness, and maintenance of software projects to ensure their efficiency, consistency, and long-term viability. Read more on housekeeping work here

Automate credentials rotating for Gitlab Testing Infrastructure #2316
Remove rSpec retry gem in EE tests #2268 (closed)
E2E test pipeline auditing - #2517 (comment 1821324281)
Simplify the process of pushing end-to-end test metrics data to the data warehouse &72 (closed)
Complete handover for feature and unit testing pipeline from Engineering Productivity team
Reduce Data Seeder's namespace factory error rate
Organize and prioritize documentation improvements (e.g., this old epic and all of these issues)

cc: @gl-quality/tp-test-tools-infrastructure

Edited May 06, 2024 by Abhinaba Ghosh