Make direct_access API endpoint resilient to upstream LLM provider outages (#2273) · Issues · GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway · GitLab

Make direct_access API endpoint resilient to upstream LLM provider outages

## Problem An outage at Anthropic/Claude caused gitlab monolith's call to the following endpoints to halt for 30 seconds and causing puma worker to saturate. - POST `/v1/code/user_access_token` This was requested from Rails: - POST `/api/v4/code_suggestions/direct_access` - POST `/api/v4/ai/third_party_agents/direct_access` See https://log.gprd.gitlab.net/app/r/s/tj0pq for example of those slow requests ## Root Cause The direct_access API endpoints lacks resilience mechanisms to handle upstream provider outages. When the Claude API experiences issues, requests accumulate and exhaust available worker threads. ## Solution Improve the resilience of the direct_access API endpoint to better handle upstream LLM provider outages and prevent worker saturation. ## Related Issues - [#21728 Incident Review: Puma Saturation for ai-assisted](https://gitlab.com/gitlab-com/gl-infra/production/-/work_items/21728) - [#28660 Handle cascading error from the claude API better](https://gitlab.com/gitlab-com/gl-infra/production-engineering/-/work_items/28660)

issue