POC -Implement a LiteLLM load balancer

Problem to solve

Since plan on integrating multiple LLM providers (Vertex, Bedrock), we need to build a request distribution system that routes identical feature requests across different providers and models to optimize our capacity utilization.

Proposal

Let's use LangChain’s LiteLLM routing implementation to distribute requests across provider/model combinations. For the first iteration, we will keep routing simple (e.g., static weighting or round-robin) without token/latency/error-rate–based strategies. In later iterations, introduce Redis as the state store to enable advanced routing strategies back on token consumption, latency, error rates.

Further details

We need to modify our unit primitives' default model data structure from its current format to an array, then transform our existing data structure to match LiteLLM's expected format.
Load balancing does only apply to the “GitLab default” model, i.e, any models that users have selected via model selection will not be impacted by the load balancing mechanism.

Links / references

Edited Sep 04, 2025 by Martin Wortschack