AI Gateway - Load balance and failover across multiple LLM providers
<!-- Please read this! Before opening a new issue, make sure to search for keywords in the issues: - https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues and verify the issue you're about to submit isn't a duplicate. --> ## Problem to solve <!-- What problem do we solve? Try to define the who/what/why of the opportunity as a user story. For example, "As a (who), I want (what), so I can (why/value)." --> We've previously discussed ways to easily switch from one model to another (e.g. https://gitlab.com/gitlab-org/gitlab/-/issues/555181+ and https://gitlab.com/gitlab-org/gitlab/-/issues/542972+), but it's also becoming clear that to meet the increase in usage of AI actions, sticking to a single provider at a time is not sufficient (see https://gitlab.com/gitlab-org/gitlab/-/issues/525539#note_2662683712). ## Proposal Design a mechanism that allows having multiple models/providers be available for a specific feature, and distribute requests to them using some form of load balancing ## Further details <!-- Include examples, use cases, benefits, goals, or any other details that help us understand the problem better. --> ## Links / references <!-- Select a type --> <!-- /label ai-assist~2278648 --> <!-- /label ai-assist~10230929 --> <!-- /label ai-assist~15119514 -->
epic