Skip to content

Histogram for measuring duration until first received response chunk

Jan Provaznik requested to merge jp-add-chunk-histogram into main

What does this merge request do and why?

This histogram measures duration until the first received chunk (if streaming is used) or full response (if streaming is not used). This can be used for SLI which measures more accurately when something "shows up" for code completion/generation on user's screen because duration until first vs last stream chunk can be quite big especially for code generation.

Related to gitlab-org/gitlab#425095 (comment 1721538862)

How to set up and validate locally

  1. execute a code generation request which uses streaming
  2. check that the new metric is updated (curl http://192.168.1.8:8082/metrics|grep code_suggestions_inference_first_response_duration)

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.
Edited by Jan Provaznik

Merge request reports

Loading