Load balancer - Update clients to support provider stickiness for token caching
## Problem to solve
With the LiteLLM load balancer distributing requests across multiple providers (e.g., Vertex, Bedrock, Anthropic), subsequent requests from the same client may be routed to different providers. This breaks **token caching**, which relies on the same provider receiving consecutive turns of a conversation.
## Proposal
The AI Gateway should return the randomly selected model/provider to the client as part of the response. Clients should then pass that model identifier back in subsequent requests to ensure provider stickiness across conversation turns.
This requires changes in the following places:
* GitLab Rails
* GitLab LSP
Use a FF to gate this rollout.
issue