Make 60s timeout applied by AIGW when making requests to self-hosted model configurable
Proposal
A timeout error is sometimes experienced when running Duo operations using a self-hosted model resulting in an A1000 error presented to the user e.g.
{"severity":"INFO","time":"2025-07-28T09:07:05.794Z","correlation_id":"01K184J6BDSHFTA3TXBQ5FTH6F","meta.caller_id":"Llm::CompletionWorker","meta.feature_category":"ai_abstraction_layer","meta.organization_id":1,"meta.remote_ip":"172.30.84.228","meta.user":"user1","meta.user_id":7,"meta.client_id":"user/7","meta.root_caller_id":"GraphqlController#execute","params":{"messages":[{"role":"user","content":
<<< PROMPT >>>
","context":null,"current_file":null,"additional_context":[]}],"model_metadata":{"provider":"openai","name":"llama3","endpoint":"http://172.17.10.1:8100/v1","api_key":"","identifier":"custom_openai/RedhatAI/Llama-3.3-70B-Instruct-FP8-dynamic"},"unavailable_resources":["Pipelines","Vulnerabilities"]},"message":"Request to v2/chat/agent","class":"Gitlab::Duo::Chat::StepExecutor","ai_event_name":"performing_request","ai_component":"duo_chat"}
{"severity":"ERROR","time":"2025-07-28T09:08:06.486Z","correlation_id":"01K184J6BDSHFTA3TXBQ5FTH6F","meta.caller_id":"Llm::CompletionWorker","meta.feature_category":"ai_abstraction_layer","meta.organization_id":1,"meta.remote_ip":"172.83.83.83","meta.user":"user1","meta.user_id":7,"meta.client_id":"user/7","meta.root_caller_id":"GraphqlController#execute","error":"I'm sorry, I couldn't respond in time. Please try again.","duo_chat_error_code":"A1000","source":"chat_v2","message":"Net::ReadTimeout with #\u003cTCPSocket:(closed)\u003e","class":"Gitlab::Llm::Chain::Answer","ai_event_name":"error_returned","ai_component":"duo_chat"}
This seems to be related to the performance of the LLM and the amount of context data sent with the request and having the option to increase the timeout would allow long-running requests to complete successfully.
Requested in relation to customer support ticket (ZD internal link)
Definition of Done
- Customers can access and update a configurable parameter in the Self-Hosted AIG/WFS to control the timeout value of requests to self-hosted models
- Support timeout values ranging from 30 seconds to 600 seconds (10 minutes) with appropriate validation
- There is proper error handling for timeout scenarios with meaningful error messages
- AI Gateway installation documentation has been updated to include new timeout configuration options; add configuration examples for different deployment scenarios (vLLM, AWS Bedrock, Azure OpenAI)
- Troubleshooting documentation has been updated to reference configurable timeout for A1000 errors
Proposal
- Add a new column
timeout_in_secondstoAi::Setting, keep the default as60 - In the UI, present this new attribute and allow update with value from 60 to 600.
- Use this value in place of the hardcoded constants, in
StepExecutor,AiGateway::ClientandAiGateway::DocsClient - Update documentation
Edited by 🤖 GitLab Bot 🤖