Skip to content

Show health of QA test suite over time - Total/Reliable/Quarantined

With the recent S1/S2 incidents related to test, we need immediate visibility in the health of our QA test suite.

We already have test session based reporting in https://gitlab.com/gitlab-org/quality/testcase-sessions/-/issues and we found much value of the reports e.g.

Get a historical chart of the full suite in staging
Total 350 tests
Passed 315 tests
Failed ~17 tests
18 other tests (usually skipped)

We need a holistic view of how the health of the test suite is performing and a drive to increase reliable tests.

Plan

Get a historical chart of the full suite in staging that shows

  • Total
  • Status
    • Passing
    • Failed
  • Health
    • Reliable
    • Quarantined

Implementation

After considering possible implementation details, exporting metrics to instance of influxdb and creating dashboards in grafana was chosen as most complete and flexible option which can be expanded with other kind of test metrics tracking in the future.

Quality will have to own these parts of infrastructure since existing grafana instance is not meant for custom data sources and is mostly tracking gitlab.com metrics

dashboards url: http://dashboards.quality.gitlab.net

Tasks

  • Add custom rspec formatter for pushing test metrics to instance of influxdb in qa testing framework and update pipeline configurations accordingly
  • Adjust existing https://gitlab.com/gitlab-org/quality/engineering-productivity-infrastructure project and add deployment of influxdb and grafana to GCP project gitlab-qa-resources
  • Get review from infrastructure team
  • Set up some kind of DNS name for influxdb instance and grafana
  • Set up some kind of authentication for grafana instance for Quality department (single read only user could be used while this is being set up)
Staging runs example dashboard image

@gl-quality/managers

Edited by Ramya Authappan