Observability: Telemetry for Model Accuracy
Problems to solve
We would like to collect the prompts and the output of the model to better understand the accuracy of the model. This is also to understand quality, prompt sent, suggestion sent, suggestion accepted and dissecting data by quality, languages, and use-cases.
Proposal
The changes we would need to implement for proposal are:
-
Changes to the client to add the acceptance indicator on follow-on requests. gitlab-org/gitlab-vscode-extension!825 (merged) (merged). Adding the necessary fields to Inference Server. -
In the Model Gateway backend, parse the X-GitLab-CS-AcceptsX-GitLab-CS-Requests,X-GitLab-CS-Errorsheaders -
In the Model Gateway backend, add the accepted_request_count,total_request_countanderror_request_countfields to the access logs, containing the headers parsed in step 2. -
In the Model Gateway backend, add model_gateway_telemetery_accepts_total,model_gateway_telemetery_requests_totalandmodel_gateway_telemetery_errors_totalPrometheus counter metrics. On each request, increment each counter by the amount parsed in step 2.
Discussion Points
- Using a separate endpoint
/v2/completion_acceptedor piggy-backing on follow on requests. I think either of these should be fine, although with the separate endpoint, there is a bit more traffic. If we do go with a separate endpoint, we should probably consider batched updates on a timeout, which would be easy to do.
Nice to have follow-on actions
- Send a beacon telemetry response on timeout. If the client does not send a follow-on request within a certain timeout, send the telemetry back as a beacon response. This would need a new endpoint in the model gateway and would be implemented in a similar way to the follow-on code suggestion requests but is not needed for the first iteration.
- Add dashboards to Grafana and Kibana.
- Add details around programming language, model version etc.
Edited by Alexander Chueshev