Track when a Pipeline has more than 200 failed tests in Usage ping
Overview
We should monitor how often our trackable limit of 200 failures per pipeline is hit so we can have better insight into how this limit affects our users.
As a PM for the Test History feature, I want to know if users are frequently hitting the 200 test failures limit so I can figure out if that is an effective limit.
Proposal
We should use usage ping counters (non HLL as we don't want unique counters) to count the number of times pipelines are hitting the 200 failures trackable limit.
We should use a key like the following in known_events.yml
name: redis_hll_counters.testing.i_testing_failures_exceed_pipeline_limit_weekly
category: testing
redis_slot: testing
aggregation: weekly
The key has been added to the event dictionary already (internal link) but double check using the new metrics definition information.
We should increment the counter when we detect that we have exceeded the pipeline limit.
Original
The following discussion from !44047 (closed) should be addressed:
-
@morefice started a discussion: (+3 comments) question: Do we want to send an event to dataland every time we track those failures?