Consider labelling test failure issues with a cause so that we can track the reasons for test failures

Problem

Right now we have to read through many failure issues if we want to figure out what proportion of tests fail for what reasons. And if we wanted to track how that changes over time, we'd have to spend too much time manually tallying each failure.

Proposal

When a test failure issue is closed because the failure no longer occurs (and the test has been taken out of quarantine), apply a label to indicate the cause of the failure.

Use insights to track the causes of failures over time.

If we know what the most common causes of test failures are, we'll be better able to focus on coming up with the most effective solutions, and we'll have data to verify that they work.

Details

We currently have:

  • ~"Quality:flaky-tests"
  • ~bug

We sometimes use ~bug when there's an intended change that leads to a test failing because it needs to be updated. But I think it would be useful to differentiate between those instances and bugs that are unintentionally introduced (because we can prevent the former from breaking tests, but there's not much we can do about the latter). Something like ~"test-needed-update"

We also don't have something for when there's a bug in the test that causes consistent failures (~"Quality:flaky-tests" isn't accurate in those cases). Something like ~"bug-in-test"?

Or we could be more specific about using the labels to track the cause of a failure. E.g., ~"cause::test-needed-update"

  • ~"cause::bug-in-application"
  • ~"cause::bug-in-test"
  • ~"cause::flaky-test"
  • ~"cause::infrastructure-issue"
  • ~"cause::test-needed-update"

WDYT @gl-quality?

Assignee Loading
Time tracking Loading