Model Routing within Duo Features

Overview

For self-managed GitLab Duo, some sophisticated customers are able to host multiple models, which could be a combination of enterprise infrastructure and private clouds. Self-hosted and semi-airgapped customers want the capacity to switch the routing of LLM inputs based on not just Duo features, but also the LLM input or individual user's parameters. It may be preferred to route a given request to different LLMs depending on:

Cost
information security
User usage budgets

Proposal

Allow a self-managed customer to host more than 1 model for a given Duo feature
Customer defines rules to route alternative models based on document or user permissions; what elements we capture in customer hosted logging will be decisive in how we can inform model routing.

Document-based model routing

Project membership
Project supergroups
CODEOWNERS
data security classification level of the user’s current file, repo, group
GitLab Control Framework

User-based model routing

Simple list of user IDs
User roles
User group and supergroup membership

End user controlled model routing

end user is able to select which model to route an input to; user classifies their own input

Context-based model routing

text within the input
- example: PII detection
- example: a zero shot model determines the text to be of high sensitivity

Definition of Done

When a user engages with Duo Chat or Code Suggestions:

their input is automatically routed to either the on-prem OS model or a private cloud hosted model
the rules by which this automation happens is configurable by the customer
the user has the option to manually override the input to go to the on-prem OS model, but cannot override to send it to the private cloud hosted model

References

We can use https://github.com/aurelio-labs/semantic-router
Or we can use https://api.python.langchain.com/en/latest/chains/langchain.chains.router.llm_router.LLMRouterChain.html

Edited Sep 18, 2024 by Susie Bitters