Create Elasticsearch indexing utilization SLI
To set an SLO on the indexing process, we need to have a good way of measuring the actual performance of the whole process, which we currently don't have.
We currently monitor the Elasticsearch indexing/initial indexing queue length and RPS, but we don't have a clear metric for the utilization SLI.
This metric should behave as follows:
- Whenever the queue size increase, it should decrease
- Whenever the queue size decrease, it should increase
- Whenever the queue size < RPS, it should be 0 (or baseline)
- Whenever the RPS increase, it should increase
- Whenever the RPS decreases, it should decrease
Simply put, we should derive a metric that express the rate at which we are clearing the queue, whenever there is a queue.
Then I think we'll be in a good position to setup SLOs on this metric.
issue