Utilization metrics should include thresholds
Tamland performs forecasts on three different types of metrics:
- Saturation metrics, which have an absolute limit. Exceeding this limit may lead to system degradation.
- RPS (requests-per-second) rates, which do not have limits, but which help forecast user growth
- Utilization metrics, which are similar to saturation metrics, but which do not have a defined upper limit
Proposal: add "soft" limits/thresholds to utilization limits
Add the option to add soft thresholds over which we would prefer not to cross. Examples include:
- 100GB limit for database tables
- Cloudflare transfer costs, which tie in with usage contracts
Unlike saturation metric thresholds, which are expressed as a percentage/ratio and are unitless, the thresholds for utilization metrics are in the same unit as the utilization metric (eg, bytes, seconds, etc).
Initially, we will plot this value on the utilization graphs. In future iterations, we can start alerting on all values exceeding the threshold, and forecasting when values will exceed the threshold.
cc @edjdev following discussion in Engineering Allocation call: https://docs.google.com/document/d/164hNObllaLWosG110-A0UouYlcaqOxbPpHATFD38_Gw/edit#bookmark=id.e0l16jj39dec