Observability: Telemetry for Model Accuracy

Problems to solve

We would like to collect the prompts and the output of the model to better understand the accuracy of the model. This is also to understand quality, prompt sent, suggestion sent, suggestion accepted and dissecting data by quality, languages, and use-cases.

Proposal

drawing source

The changes we would need to implement for proposal are:

Changes to the client to add the acceptance indicator on follow-on requests. gitlab-org/gitlab-vscode-extension!825 (merged) (merged). Adding the necessary fields to Inference Server.
In the Model Gateway backend, parse the X-GitLab-CS-Accepts X-GitLab-CS-Requests, X-GitLab-CS-Errors headers
In the Model Gateway backend, add the accepted_request_count, total_request_count and error_request_count fields to the access logs, containing the headers parsed in step 2.
In the Model Gateway backend, add model_gateway_telemetery_accepts_total, model_gateway_telemetery_requests_total and model_gateway_telemetery_errors_total Prometheus counter metrics. On each request, increment each counter by the amount parsed in step 2.

Discussion Points

Using a separate endpoint /v2/completion_accepted or piggy-backing on follow on requests. I think either of these should be fine, although with the separate endpoint, there is a bit more traffic. If we do go with a separate endpoint, we should probably consider batched updates on a timeout, which would be easy to do.

Nice to have follow-on actions

Send a beacon telemetry response on timeout. If the client does not send a follow-on request within a certain timeout, send the telemetry back as a beacon response. This would need a new endpoint in the model gateway and would be implemented in a similar way to the follow-on code suggestion requests but is not needed for the first iteration.
Add dashboards to Grafana and Kibana.
Add details around programming language, model version etc.

Edited Jun 30, 2023 by Alexander Chueshev