Clarify documentation around modifying Internal Metrics
Problem
It happens from time to time that MRs are proposed and merged that modify existing metrics (e.g. !115416 (merged), !117939 (merged) and initial versions of !116423 (merged)). In those cases it is not necessarily the logic of calculation that is changed but rather the properties of a metric, like for which edition's it's collected or in the case of an aggregation across multiple events, the included events.
Our documentation discourages changes to metrics, but in particular mentions "changes to the logic used to calculate any metric" which can be interpreted as excluding changes to the yml files. At the same time, we ourselves are not consistent in recommending when to create new metrics and there's recurring discussions about this topic since removing an existing metric and creating a new one can be felt as a big ask towards developers (see this office hours discussion: https://youtu.be/QwEue1sHSLY)
Proposed solution
In the long run, we should think about whether there are better solutions how to evolve metrics (see #408634), but for now, we need a consistent policy. The proposal would be the following
- Disallow any changes to existing metrics (logic or yml file) and enforce that in reviews
- Create a black-list of properties that should not be change since they can have influence on the calculation e.g.
time_frame
- Create escalation process when user's insists to change metric (e.g. because of a bug) which includes further stakeholders. This is already mentioned in the current documentation but could probably be made clearer
- Update documentation accordingly
- Add a dangerbot message if a metric change is discovered with a link to the relevant documentation to ease review.