Improve Duo Chat error classification

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

  • Close this issue

This is blocked by Log error status of v2 chat agent correctly (gitlab-org/modelops/applied-ml/code-suggestions/ai-assist#638 - closed)

Proposal

Address the following comment:

https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/gitlab/llm/chain/agents/single_action_executor.rb#L58-59

          # TODO: Improve these error messages. See https://gitlab.com/gitlab-org/gitlab/-/issues/479465
          # TODO Handle ForbiddenError, ClientError, ServerError.

https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/gitlab/duo/chat/step_executor.rb#L101

          # TODO: Improve error handling

A few notes:

  • Consider hitting A9999 is a signal that there is an edge case we've not properly handled yet.
  • Don't use a wrong classification for the exceptions e.g. Raise Client::ConnectionError where the connection error hasn't happened.
  • Document it in https://docs.gitlab.com/ee/user/gitlab_duo_chat/troubleshooting.html.

Anti-pattern

V1's following error handling is an example of anti-pattern:

https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/gitlab/llm/chain/agents/zero_shot/executor.rb#L83-89

            rescue Gitlab::Llm::AiGateway::Client::ConnectionError => error
              Gitlab::ErrorTracking.track_exception(error)
              Answer.error_answer(
                error: error,
                context: context,
                error_code: "A1001"
              )
  • Sending HTTP request to AI Gateway and getting response.success? == false means there are multiple possibilities:
    • Client-side error (4xx): We wrongly compose a request to Anthropic. The bug exists in our code.
    • Server-side error (5xx): Anthropic failed to process our request even though it's correctly composed. The bug exists in their code.
    • The HTTP request didn't reach AI Gateway. It's a network configuration issue e.g. a wrong address is set to AI_GATEWAY_URL, AI Gateway is not up, firewall issue, etc.
    • etc

Hence, the error is not actionable for both of end-users and us.

Out of scope

  • Use the exception to improve the resiliency. This is future iteration.
    • Example: Retry the step execution when 529 is returned from Anthropic, otherwise don't retry.

Related

https://gitlab.slack.com/archives/C053WFAK56U/p1723747844526719

Edited Sep 11, 2025 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading