Reduce Utilization error "spend"
Problem
In the 2021-09-04 error budget report, grouputilization is highlighted as one of the top groups "over budget".
Cause
From a review of our stage dashboard and associated kibana logs (internal links), this looks to be a result of the UpdateHighestRoleWorker
having a high error rate due to validation errors.
There maybe other offenders, but this seems the likely culprit from my initial investigation. If there are more, we can add them to this issue and potentially promote to an epic for tracking purposes.
Proposed solution
Initially, we can reduce our error budget spend by identifying where and why the validation error is occuring and either implementing a fix (if it's a bug) or handling the error more gracefully
It looks like it is likely to be this update!
that is failing, so we can probably start our investigation from there.