Create Elasticsearch indexing utilization SLI

To set an SLO on the indexing process, we need to have a good way of measuring the actual performance of the whole process, which we currently don't have.

We currently monitor the Elasticsearch indexing/initial indexing queue length and RPS, but we don't have a clear metric for the utilization SLI.

This metric should behave as follows:

  • Whenever the queue size increase, it should decrease
  • Whenever the queue size decrease, it should increase
  • Whenever the queue size < RPS, it should be 0 (or baseline)
  • Whenever the RPS increase, it should increase
  • Whenever the RPS decreases, it should decrease

Simply put, we should derive a metric that express the rate at which we are clearing the queue, whenever there is a queue.

Then I think we'll be in a good position to setup SLOs on this metric.