FY25 Roadmap - Test and Tools Infrastructure
Welcome to Test and Tools Infrastructure team's FY25 roadmap. We remain open to your feedback and are prepared to adjust our plans accordingly.
Track: Test Tooling Dashboard & Monitoring
-
⌛ Establish Observability on Test Reliability and Efficiency ( #2184)-
Build observability for Test Execution Time Metrics ( #2244 (closed)) -
Build observability for Test Economy Metrics ( https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/work_items/2245) -
Build observability for Test Flakiness Quotient Metrics ( #2259 (closed)) -
Build observability for Test Infra Availability Metrics ( #2260 (closed)) -
Add alerts when test pass-rate or flakiness rate cross certain thresholds to improve visibility of bad tests ( #2591)
-
-
Extend test tooling dashboards for non-E2E level tests &68 -
Test execution time dashboard for non-e2e tests -
Test flakiness metrics for non-e2e tests -
Test economy metrics for non-e2e tests -
Test infra availability metrics for non-e2e tests
-
-
Onboard projects beyond gitlab.com in the test tooling dashboard -
Onboard cDot -
Onboard Product analytics
-
-
Migrate existing Grafana dashboards to Snowflake
Track : Test Infrastructure Efficiency & Foundation
Problems to solve
- Failures in tests targeting stages like staging canary (gstg-cny) and production (gprd) have significantly hindered deployments. From 2023 - Deployments metrics review, 59% of deployment blockers, totalling 190 incidents, were attributed to flaky tests, with 112 identified as flaky test root causes.
- The effort required to maintain tests has escalated, with a notable increase in pipeline triage tasks.
- Historical data shows that test flakiness has led to critical defects slipping through to production, impacting user experience and requiring substantial resources for post-release patches and fixes.
- The ongoing issues with test reliability and deployment blockages have led to a degraded developer experience, characterised by delays, increased workload for debugging and fixing, and challenges in maintaining consistent development momentum.
- Elevated feedback time, some MR pipelines can take ~> 1hr 10m.The prolonged time required to build Omnibus docker images in the GitLab-QA orchestrator is a critical bottleneck, leading to delays in testing and development cycles.There's a significant opportunity to enhance the efficiency of cloud-native build pipelines.
- Despite having a suite of reliable orchestrated tests, our current setup does not leverage them to block MRs, leading to a significant gap.
-
⌛ Automate quarantining and promotion of reliable/blocking tests ( #1918 (closed)) -
Architectural changes required to improve test environment reliability -
⌛ Build ephemeraltest-on-cng
environments for better stability and speed ( &49) -
Evaluate cleanup policies and methods of live environments ( #2376) -
Investigate alternatives to overcome live environment limitations for test data -
QA runners infrastructure deployment ( #2205 (closed))
-
-
Selective test execution to run on MR pipelines based on changes made ( &47) -
Research refinements for selective execution algorithms -
Integrate improved selective test execution into the package-and-test
pipeline -
Block merge requests on selected test failures
-
-
Improve parallel test execution to reduce feedback time and cost - &69 -
Decreased dependency on live environments by shifting testing left -
Identify test dependencies on live environments -
Audit existing pipelines for inefficient test scenarios -
Devise a strategy to test .com
-only features -
Evaluate the impact of "allowed-to-fail" in pipelines
-
-
Shift responsibility of fixing stale or broken tests to merge request author -
Identify and execute necessary steps to shift accountability to test reliability maintenance -
Make orchestrated tests blocking -
Run end-to-end testing on all merge requests -
Expand test suite running against gdk
environment
-
-
Auto-deployment test coverage is dependable and adequate ( &50 (closed)) -
Adjust sanity criteria to reduce deployment-blocking inefficiencies ( #2347 (closed)) -
Review smoke tests for appropriate coverage and runtime ( #2347 (closed))
-
-
CustomersDot -
Enable CustomersDot E2E tests in staging-ref -
Fulfillment related deployments triggers CDot e2e tests -
Run E2E tests in MR in CDot project
-
Track: AI Initiative Test Tooling
Problems to solve
- Performance related incidents leading to outages (e.g. gitlab-com/gl-infra/production#14468 (closed))
- Uncertainty about the effectiveness of changes required to improve the quality of AI-generated responses and increase user satisfaction
- Uncertainty about the risk of regressions, particularly due to complex interactions between system components
- Inefficiencies in deploying/updating/managing test environments, especially when that involves system components like the AI gateway that are unfamiliar to most engineers
-
⌛ Comparative Analysis for Gitlab Duo Features in an IDE Environment-
POC to evaluate the latency comparison https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/2247 -
⌛ MVP to continuously analysing performance and code quality data in VsCode https://gitlab.com/groups/gitlab-org/quality/quality-engineering/-/epics/63 -
Expand the tooling for all supported IDEs -
Expand the tooling to support all Gitlab Duo Features -
Expand to all competitive AI providers that offer code suggestions via extensions
-
-
⌛ Improve Code Suggestions performance testing-
Set performance benchmarks for usage patterns -
⌛ AI GPT tests for 1k, 10k and 50k reference architecture &39
-
-
Penetration testing and code reviews focusing on AI-integrated features to ensure robust security. -
Profile and optimize critical code paths in both the Web IDE and VS Code Extension to ensure optimal loading and execution times. -
Create a scalable user-based process to validate our automated evaluation of AI feature quality -
Make it easier to create test environments -
Cross-version testing to verify compatibility with different versions of GitLab and the current AI frameworks. -
Consolidate Code Suggestions and Duo Chat evaluation of the quality of responses
Track: Product Analytics
Problems to solve
-
For CustomersDot, automated tests still allow bugs to reach production due to a 4-hour window between staging and production, despite being caught in staging. - There's no system for early test feedback in Merge Requests (MRs), limiting engineers' ability to verify changes with E2E tests before deployment.
- Risks of making breaking changes and low confidence when making changes to Product Analytics.Testing/deployment pipelines are incomplete for Product Analytics. Some parts of Product Analytics that are developed within Gitlab monolith like frontend, product analytics settings in Gitlab and GraphQL APIs that can be tested and deployed within Gitlab monolith’s CI/CD.
-
⌛ Product Analytics team is enabled to add e2e tests ( &42) -
Product Analytics has a setup for cube schemas testing -
⌛ Product Analytics has solid pipelines for testing and deployment ( &43) -
Introduce contract testing for Product Analytics SDKs -
Mechanisms to run and evaluate load testing for Product Analytics -
Create a dashboard representing test metrics (e.g. Amount of tests run, tests passed/failed, current code coverage, etc. |
Track: Process Improvement / Housekeeping
practice that emphasizes the organization, cleanliness, and maintenance of software projects to ensure their efficiency, consistency, and long-term viability. Read more on housekeeping work here
-
Automate credentials rotating for Gitlab Testing Infrastructure #2316 -
Remove rSpec retry gem in EE tests #2268 (closed) -
E2E test pipeline auditing - #2517 (comment 1821324281) -
Simplify the process of pushing end-to-end test metrics data to the data warehouse &72 (closed) -
Complete handover for feature and unit testing pipeline from Engineering Productivity team -
Reduce Data Seeder's namespace factory error rate -
Organize and prioritize documentation improvements (e.g., this old epic and all of these issues)
Edited by Abhinaba Ghosh