Retry mechanism in step executor
Description
When error event is returned from v2/chat/agent, we should retry the execution to increase the chance of successful operation.
Proposal
- Add
retryablefield to the Error agent event. - When
retryable: true, we should retry the operation. This could be a case when temporary system failure occures (e.g. system overload error) - When
retryable: false, we should surface the error to the user. e.g.Something went wrong during the request. Try "/clean" and request again. This could be a case when client error happens e.g.invalid_request_erroror max token length limit. - When
retryableis nil, do nothing.
Example
GitLab-Sidekiq:
diff --git a/ee/lib/gitlab/duo/chat/agent_events/error.rb b/ee/lib/gitlab/duo/chat/agent_events/error.rb
index 8c2b6c4da6f5..ec154a7eb1ce 100644
--- a/ee/lib/gitlab/duo/chat/agent_events/error.rb
+++ b/ee/lib/gitlab/duo/chat/agent_events/error.rb
@@ -8,6 +8,10 @@ class Error < BaseEvent
def message
data["message"]
end
+
+ def retryable
+ data["retryable"] || false
+ end
end
end
end
diff --git a/ee/lib/gitlab/llm/chain/agents/single_action_executor.rb b/ee/lib/gitlab/llm/chain/agents/single_action_executor.rb
index 3b93189903a7..842f78c10952 100644
--- a/ee/lib/gitlab/llm/chain/agents/single_action_executor.rb
+++ b/ee/lib/gitlab/llm/chain/agents/single_action_executor.rb
@@ -19,6 +19,7 @@ class SingleActionExecutor
attr_accessor :iterations
MAX_ITERATIONS = 10
+ MAX_RETRY_STEP_FORWARD = 1
# @param [String] user_input - a question from a user
# @param [Array<Tool>] tools - an array of Tools defined in the tools module.
@@ -36,7 +37,7 @@ def initialize(user_input:, tools:, context:, response_handler:, stream_response
def execute
MAX_ITERATIONS.times do
- events = step_forward
+ events = with_agent_retry { step_forward }
raise EmptyEventsError if events.empty?
@@ -294,6 +295,26 @@ def current_blob
def chat_feature_setting
::Ai::FeatureSetting.find_by_feature(:duo_chat)
end
+
+ def with_agent_retry
+ retries = 0
+
+ begin
+ yield
+ rescue AgentEventError => ex
+ raise ex if retries >= MAX_RETRY_STEP_FORWARD
+ raise ex unless ex.retryable?
+
+ log_warn(message: "Retrying agent step forward",
+ event_name: 'retry',
+ ai_component: 'duo_chat',
+ ai_error_class: ex.class.name,
+ ai_error_message: ex.message)
+
+ retries += 1
+ retry
+ end
+ end
end
end
end
AI-Gateway:
diff --git a/ai_gateway/chat/agents/react.py b/ai_gateway/chat/agents/react.py
index a64d0ebd..bdc4740f 100644
--- a/ai_gateway/chat/agents/react.py
+++ b/ai_gateway/chat/agents/react.py
@@ -251,7 +251,12 @@ class ReActAgent(Prompt[ReActAgentInputs, TypeAgentEvent]):
events.append(event)
except Exception as e:
- yield AgentError(message=str(e))
+ retryable = False
+
+ if "overloaded_error" in str(e):
+ retryable = True
+
+ yield AgentError(message=str(e), retryable=retryable)
raise
if any(isinstance(e, AgentFinalAnswer) for e in events):
diff --git a/ai_gateway/chat/agents/typing.py b/ai_gateway/chat/agents/typing.py
index 674f2a7f..d2673311 100644
--- a/ai_gateway/chat/agents/typing.py
+++ b/ai_gateway/chat/agents/typing.py
@@ -45,6 +45,7 @@ class AgentUnknownAction(AgentBaseEvent):
class AgentError(AgentBaseEvent):
type: str = "error"
message: str
+ retryable: bool
TypeAgentEvent = TypeVar(
Edited by Shinya Maeda