Offered models storage system and API endpoint for Model Switching

Consider this interaction for the Model Switching feature

sequenceDiagram
    Self Managed->>Cloud Connector: Scheduled Sidekiq job calls API
    Cloud Connector->>AI Gateway: Fetch Model List
    AI Gateway->>AI Gateway: Fetch Model List
    AI Gateway-->>Cloud Connector: Return list of models
    Cloud Connector-->>Self Managed: Return list of models
    Self Managed->>Self Managed: Insert model records and set defaults

Below you will find what we need from the AIGW (issue's scope)

Store the inference servers metadata that we support in the AIGW

It need to be environment specific. It should list the features each models supports and it should have a release state.

Create a private repo where we would store the YML file for each envs. For example env/stagin-ref/offered_models_metadata.yaml. Make sure that groupai framework and groupcustom models stakeholders have access to it.
Store the access-token to this repo, and the url to the raw file in the the vault as an environment var. MODEL_SWITCHING_METADATA_ACCESS_TOKEN and MODEL_SWITCHING_METADATA_FILE_PATH respectively. Inject them with runway.
In the logic, when booth those env vars are present at startup, AIGW download the file and stores it in the tmp folder as we already for the offline documentation search DB.

The yaml files should look something like this.

- name: "Claude 3.7 sonnet"
  identifier: bedrock/anthropic.claude-3-7-sonnet-20250219-v1:0
  api-key: AIGW_MODEL_<ID>_API_KEY
  endpoint: https://example_models.com/v1
  release_state: GA 
  family: claude # Same name as in the prompt file
  features:
  - code-generation
- name: ETC...

Each API keys value of models in the file should reference an environment variable injected with runway https://docs.runway.gitlab.com/guides/byo-gcp-folder/. Use those reference in the YML file to resolve the api key values.

Build an endpoint to get the list of the offered models

The response should look like this:

[
  {
     'name': 'GitLab: Claude 3.7 Sonnet',
     'id': 'claude_3_7_sonnet',
     'release_state': 'GA'
     'features':  ['duo_chat', 'code-generation']
  },
  {...}
]

Current roster for offered models

Code Completion:

Codestral on Fireworks (default)
Codestral on Vertex (GA)
Qwen 2.5 7B on Fireworks (Beta)

Code Generation

Claude 3.7 Sonnet

For reference.

Edited Apr 15, 2025 by Patrick Cyiza