Error Budget improvement: POST /api/:version/chat/completions

added Error Budget Improvement groupduo chat maintenanceperformance typemaintenance labels

added Category:Duo Chat devopsai-powered sectiondata-science labels

changed the description

marked this issue as related to #482500

assigned to @lulalala

added Deliverable label

I used the profiler to see what was the hot spot:

System Time flamegraph

Object allocation flamegraph

Currently the most time was just waiting for AI Gateway streaming response. So I don't think the bottleneck is at the Ruby side.

added workflowin review label

mentioned in merge request !168194 (merged)

set weight to 2

Over the weekend there were 30 calls. Though we have set the urgency to low, the Apdex score was still affected (down to 78.9%) because all /api/:version/chat/completions calls took between 5 ~ 10 seconds.

@juan-silva @shinya.maeda Here are my findings, and I'm interested in hearing your thoughts.

Currently, this endpoint uses LLM in a synchronous manner, meaning there is not always possible to reduce the duration below our current goal of 5 seconds. For exmple, any questions which requires 2 LLM steps will go over that limit.

The short term alternatives are:

Ask the model validation team to use the GraphQL endpoint instead
Set its feature_category as "not_owned" (this is a hack)

The long term solution could be

Make this endpoint work asynchronously

Another cause of low error budget score is because we having too few measurements.

Currently we have around 30 measurements daily, which is far less than the Duo Chat servings. This means it is easy to screw the score by having 1 internal request hitting /api/:version/chat/completions.

Most of the real user requests are hitting Llm::CompletionWorker, which is classified as ai_abstraction_layer because it serves all AI requests. We probably missed most of the good measurements here.

Instead, there is a separate SLI called llm_completion. It is properly categorized as duo_chat. I think we should track that of the generic Apdex score? Thoughts?

@lulalala

Ask the model validation team to use the GraphQL endpoint instead

I think this should be the long term solution. Currently it's a burden for us to maintain both GraphQL and Rest. And more importantly, we should use the exactly the same process flow with the production, otherwise we can't 100% trust eval performance as production performance.

FYI, some of the evals on prompt library have already switched to graphql. e.g. https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library/-/blob/main/promptlib/completion/gitlab_models.py#L209. I think older ones still use Rest.

Set its feature_category as "not_owned" (this is a hack)

Or it could be groupai model validation since the endpoint was introduced for evaluations.

Instead, there is a separate SLI called llm_completion. It is properly categorized as duo_chat. I think we should track that of the generic Apdex score? Thoughts?

Yes, I agree with that this is right direction. We need the metrics based SLI/SLO dashboard with alerts, so that we can leave from Kibana.

In this case, maybe we don't need to strictly stick with error budget dashboard, but can introduce a new Grafana dashboard. FYI, there are a few chat related SLI in this AI Gateway dashbaord, but we should have a dedicated Duo Chat dashboard for entire services, including both GitLab and AI Gateway.

added workflowblocked label and removed workflowin review label

marked this issue as related to #493174 (closed)

unassigned @lulalala

removed Deliverable label

mentioned in epic gitlab-org#15817 (closed)

Error Budget improvement: POST /api/:version/chat/completions

Problem

Proposal

Designs

Child items ...

Activity

Error Budget improvement: POST /api/:version/chat/completions

Problem

Proposal

Relates to

Activity