Skip to content

Draft: feat: poc for routing prompts based on percentage

Igor Drozdov requested to merge id-prompts-router-poc into main

What does this merge request do and why?

This MR demonstrates the following functionality:

  • Reads config.yml from the root of the project. The file is expected to be of this format:
    type: percentage
      - model: claude-3-5-sonnet@20240620
        provider: anthropic
          percentage: 40
      - model: vertex_ai/claude-3-5-sonnet@20240620
        provider: litellm
          percentage: 60
  • This config defines the router that routes to either claude-3-5-sonnet@20240620 model of Anthropic API or vertex_ai/claude-3-5-sonnet@20240620 based on the chance: 40% of traffic goes to the first, 60% goes to the latter

It's an example for the solution that can be applied to Implement a multi-provider strategy to reduce d... (gitlab-org&14873)


  1. Run AI Gateway and visit http://localhost:5052/docs
  2. Perform a request to the following URL v1/prompts with the following prompt-id chat/explain_code and params:
  "inputs": {
    "input": "Please explain this Code",
    "selected_text": "console.log('hello');",
    "file_content": "console.log('hello');",
    "language_info": "JavaScript"
  "stream": false
  1. View the logs:

Sometimes it has the following output:

<dependency_injector.providers.Factory(<dependency_injector.providers.Dependency(<class 'object'>) at 0x174700d00, container name: "ContainerApplication.pkg_prompts.models.lite_llm_chat_fn">) at 0x174701120>

Sometimes it's:

<dependency_injector.providers.Factory(<dependency_injector.providers.Dependency(<class 'object'>) at 0x174700c40, container name: "ContainerApplication.pkg_prompts.models.anthropic_claude_chat_fn">) at 0x174700be0>
Edited by Igor Drozdov

Merge request reports
