Show health of QA test suite over time - Total/Reliable/Quarantined
With the recent S1/S2 incidents related to test, we need immediate visibility in the health of our QA test suite.
We already have test session based reporting in https://gitlab.com/gitlab-org/quality/testcase-sessions/-/issues and we found much value of the reports e.g.
- gitlab-org/quality/testcase-sessions#22458 (closed)
- gitlab-org/quality/testcase-sessions#22458 (closed)
Get a historical chart of the full suite in staging
Total 350 tests
Passed 315 tests
Failed ~17 tests
18 other tests (usually skipped)
We need a holistic view of how the health of the test suite is performing and a drive to increase reliable tests.
Plan
Get a historical chart of the full suite in staging that shows
- Total
- Status
- Passing
- Failed
- Health
- Reliable
- Quarantined
Implementation
After considering possible implementation details, exporting metrics to instance of influxdb
and creating dashboards in grafana
was chosen as most complete and flexible option which can be expanded with other kind of test metrics tracking in the future.
Quality will have to own these parts of infrastructure since existing grafana
instance is not meant for custom data sources and is mostly tracking gitlab.com metrics
dashboards url: http://dashboards.quality.gitlab.net
Tasks
-
Add custom rspec formatter for pushing test metrics to instance of influxdb
inqa
testing framework and update pipeline configurations accordingly -
Adjust existing https://gitlab.com/gitlab-org/quality/engineering-productivity-infrastructure project and add deployment of influxdb
andgrafana
toGCP
projectgitlab-qa-resources
-
Get review from infrastructure team -
Set up some kind of DNS name for influxdb
instance andgrafana
-
Set up some kind of authentication for grafana
instance for Quality department (single read only user could be used while this is being set up)