Define a process for investigating groups with suspiciously high availability

Currently, most groups have an availability score well above 99.95%. Mostly because all requests are graded against a 5s target duration. Which will be improved when groups have opted in to using the new metrics &525 with a target duration based on the urgency of the endpoint.

In #1500 (closed) we've seen a case where there were groups had increased availability because of a bug. This was left uncaught until we went looking for something unrelated.

Once &525 is finalized, and the availability numbers would be more realistic, we should also have process for investigating availability numbers that are higher than expected.

Edited Jan 19, 2022 by Bob Van Landuyt