Skip to content

Utilization metrics should include thresholds

Tamland performs forecasts on three different types of metrics:

  1. Saturation metrics, which have an absolute limit. Exceeding this limit may lead to system degradation.
  2. RPS (requests-per-second) rates, which do not have limits, but which help forecast user growth
  3. Utilization metrics, which are similar to saturation metrics, but which do not have a defined upper limit

Proposal: add "soft" limits/thresholds to utilization limits

Add the option to add soft thresholds over which we would prefer not to cross. Examples include:

  1. 100GB limit for database tables
  2. Cloudflare transfer costs, which tie in with usage contracts

Unlike saturation metric thresholds, which are expressed as a percentage/ratio and are unitless, the thresholds for utilization metrics are in the same unit as the utilization metric (eg, bytes, seconds, etc).

Initially, we will plot this value on the utilization graphs. In future iterations, we can start alerting on all values exceeding the threshold, and forecasting when values will exceed the threshold.

cc @qmnguyen0711

cc @edjdev following discussion in Engineering Allocation call: https://docs.google.com/document/d/164hNObllaLWosG110-A0UouYlcaqOxbPpHATFD38_Gw/edit#bookmark=id.e0l16jj39dec