Define a process for investigating groups with suspiciously high availability
Currently, most groups have an availability score well above 99.95%. Mostly because all requests are graded against a 5s target duration. Which will be improved when groups have opted in to using the new metrics &525 with a target duration based on the urgency of the endpoint.
In #1500 (closed) we've seen a case where there were groups had increased availability because of a bug. This was left uncaught until we went looking for something unrelated.
Once &525 is finalized, and the availability numbers would be more realistic, we should also have process for investigating availability numbers that are higher than expected.
Edited by Bob Van Landuyt