Track when a Pipeline has more than 200 failed tests in Usage ping

Overview

We should monitor how often our trackable limit of 200 failures per pipeline is hit so we can have better insight into how this limit affects our users.

As a PM for the Test History feature, I want to know if users are frequently hitting the 200 test failures limit so I can figure out if that is an effective limit.

Proposal

We should use usage ping counters (non HLL as we don't want unique counters) to count the number of times pipelines are hitting the 200 failures trackable limit.

We should use a key like the following in known_events.yml

name: redis_hll_counters.testing.i_testing_failures_exceed_pipeline_limit_weekly
  category: testing
  redis_slot: testing
  aggregation: weekly

The key has been added to the event dictionary already (internal link) but double check using the new metrics definition information.

We should increment the counter when we detect that we have exceeded the pipeline limit.

Original

The following discussion from !44047 (closed) should be addressed:

@morefice started a discussion: (+3 comments)

question: Do we want to send an event to dataland every time we track those failures?

Edited Apr 29, 2021 by James Heimbuck