Use prometheus_fastapi_instrumentator to measure time to first chunk
Currently we measure time to last chunk (until streaming is finished) in AI gateway SLIs. When https://github.com/trallnag/prometheus-fastapi-instrumentator/pull/290 is merged, we will be able to measure time to first received chunk which is perhaps better metric.
The following discussion from !6928 (merged) should be addressed:
-
@jprovaznik started a discussion: (+5 comments) @shinya.maeda WDYT about these thresholds for chat endpoint? Is 15s or should this be increased further? Based on gitlab-org/gitlab#425095 (comment 1792999180) it seems that average value for p95 is 30s.