Make 60s timeout applied by AIGW when making requests to self-hosted model configurable

Proposal

A timeout error is sometimes experienced when running Duo operations using a self-hosted model resulting in an A1000 error presented to the user e.g.

{"severity":"INFO","time":"2025-07-28T09:07:05.794Z","correlation_id":"01K184J6BDSHFTA3TXBQ5FTH6F","meta.caller_id":"Llm::CompletionWorker","meta.feature_category":"ai_abstraction_layer","meta.organization_id":1,"meta.remote_ip":"172.30.84.228","meta.user":"user1","meta.user_id":7,"meta.client_id":"user/7","meta.root_caller_id":"GraphqlController#execute","params":{"messages":[{"role":"user","content":
<<< PROMPT >>> 
","context":null,"current_file":null,"additional_context":[]}],"model_metadata":{"provider":"openai","name":"llama3","endpoint":"http://172.17.10.1:8100/v1","api_key":"","identifier":"custom_openai/RedhatAI/Llama-3.3-70B-Instruct-FP8-dynamic"},"unavailable_resources":["Pipelines","Vulnerabilities"]},"message":"Request to v2/chat/agent","class":"Gitlab::Duo::Chat::StepExecutor","ai_event_name":"performing_request","ai_component":"duo_chat"}
{"severity":"ERROR","time":"2025-07-28T09:08:06.486Z","correlation_id":"01K184J6BDSHFTA3TXBQ5FTH6F","meta.caller_id":"Llm::CompletionWorker","meta.feature_category":"ai_abstraction_layer","meta.organization_id":1,"meta.remote_ip":"172.83.83.83","meta.user":"user1","meta.user_id":7,"meta.client_id":"user/7","meta.root_caller_id":"GraphqlController#execute","error":"I'm sorry, I couldn't respond in time. Please try again.","duo_chat_error_code":"A1000","source":"chat_v2","message":"Net::ReadTimeout with #\u003cTCPSocket:(closed)\u003e","class":"Gitlab::Llm::Chain::Answer","ai_event_name":"error_returned","ai_component":"duo_chat"}

This seems to be related to the performance of the LLM and the amount of context data sent with the request and having the option to increase the timeout would allow long-running requests to complete successfully.

Requested in relation to customer support ticket (ZD internal link)

Definition of Done

Customers can access and update a configurable parameter in the Self-Hosted AIG/WFS to control the timeout value of requests to self-hosted models
Support timeout values ranging from 30 seconds to 600 seconds (10 minutes) with appropriate validation
There is proper error handling for timeout scenarios with meaningful error messages
AI Gateway installation documentation has been updated to include new timeout configuration options; add configuration examples for different deployment scenarios (vLLM, AWS Bedrock, Azure OpenAI)
Troubleshooting documentation has been updated to reference configurable timeout for A1000 errors

Proposal

Add a new column timeout_in_seconds to Ai::Setting, keep the default as 60
In the UI, present this new attribute and allow update with value from 60 to 600.
Use this value in place of the hardcoded constants, in StepExecutor , AiGateway::Client and AiGateway::DocsClient
Update documentation

Edited Oct 31, 2025 by 🤖 GitLab Bot 🤖