Add cut off warning when non agentic chat does not complete

What does this MR do and why?

Add a check to the last non agentic message streamed from ai-gateway, and if the last message does not have a finish_reason, then the message was cut off and we need to append a message to inform the user.

This is behind a feature flag #598963 duo_non_agentic_chat_ai_message_cut_of_warning as it is not possible to test on gdk as the main source of the timeout is Cloud Run ingress 90 second timeout gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!4403 (comment 3248047515)

This is the initial MR, with the ultimate goal of improving the cut off message usefulness by also closing any open code blocks. I.e. to fix the code being displayed as text:

image.png

Further steps

  1. Remove the feature flag
  2. Possibly add same warning for finish_reason == "length"
    1. Either remove the code from gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!4403 (merged) first (I am in favour of this one, as the code does not work on prod anyway)
    2. or add both behind a feature flag with the ai-gateway code enabled by default, and then flip the flags and remove the code from the ai-gateway
  3. Add regex to detect code blocks that are not closed, and close them (rails version of gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!4672 (diffs) which cannot be merged now)

References

Issue: #583882. Also has child issues with followups

Feature Flag rollout issue: #598963

First attempt: gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!4403 (merged)

The first attempt detected the finish reason passed by the the provider (anthropic, vertex, etc), and added a warning if the finish_reason was length or max_tokens

This worked well in gdk, but I could not replicate it in production as the Cloud Run ingress HTTP connection would timeout.

Screenshots or screen recordings

Before After (flag enabled)
image.png image.png
image.png image.png

How to set up and validate locally

  1. enable the feature flag
    1. Run Feature.enable(:duo_non_agentic_chat_ai_message_cut_off_warning, User.find(...)) in gdk rails c
  2. Apply the diff below to the ai-gateway to cause it to stop streaming mid message.
  3. Send a message that expects an answer longer than a sentence. You can also customise the final_answer_count threshold.
  4. Duo should cut off, but include a warning.
  5. Also confirm in VScode
diff --git a/ai_gateway/chat/agents/react.py b/ai_gateway/chat/agents/react.py
index 8510419b0..2f476de28 100644
--- a/ai_gateway/chat/agents/react.py
+++ b/ai_gateway/chat/agents/react.py
@@ -245,6 +245,7 @@ class ReActAgent(RunnableBinding[ReActAgentInputs, TypeAgentEvent]):
         len_final_answer = 0
         agent_final_answer_found = False
         agent_tool_action_found = False
+        final_answer_count = 0
 
         try:
             async for event in astream:
@@ -263,6 +264,10 @@ class ReActAgent(RunnableBinding[ReActAgentInputs, TypeAgentEvent]):
                         response = self._append_final_message_warnings(response)
                         yield cast(TypeAgentEvent, response)
 
+                        final_answer_count += 1
+                        if final_answer_count >= 5:
+                            return
+
                         len_final_answer = len(event.text)
 
                 events.append(event)

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Kaveh Nejad

Merge request reports

Loading