Offered models storage system and API endpoint for Model Switching
Consider this interaction for the Model Switching feature
sequenceDiagram
Self Managed->>Cloud Connector: Scheduled Sidekiq job calls API
Cloud Connector->>AI Gateway: Fetch Model List
AI Gateway->>AI Gateway: Fetch Model List
AI Gateway-->>Cloud Connector: Return list of models
Cloud Connector-->>Self Managed: Return list of models
Self Managed->>Self Managed: Insert model records and set defaults
Below you will find what we need from the AIGW (issue's scope)
Store the inference servers metadata that we support in the AIGW
It need to be environment specific. It should list the features each models supports and it should have a release state.
-
Create a private repo where we would store the YML file for each envs. For example
env/stagin-ref/offered_models_metadata.yaml. Make sure that groupai framework and groupcustom models stakeholders have access to it. -
Store the access-token to this repo, and the url to the raw file in the the vault as an environment var.
MODEL_SWITCHING_METADATA_ACCESS_TOKENandMODEL_SWITCHING_METADATA_FILE_PATHrespectively. Inject them with runway. -
In the logic, when booth those env vars are present at startup, AIGW download the file and stores it in the tmp folder as we already for the offline documentation search DB.
The yaml files should look something like this.
- name: "Claude 3.7 sonnet"
identifier: bedrock/anthropic.claude-3-7-sonnet-20250219-v1:0
api-key: AIGW_MODEL_<ID>_API_KEY
endpoint: https://example_models.com/v1
release_state: GA
family: claude # Same name as in the prompt file
features:
- code-generation
- name: ETC...
Each API keys value of models in the file should reference an environment variable injected with runway https://docs.runway.gitlab.com/guides/byo-gcp-folder/. Use those reference in the YML file to resolve the api key values.
Build an endpoint to get the list of the offered models
The response should look like this:
[
{
'name': 'GitLab: Claude 3.7 Sonnet',
'id': 'claude_3_7_sonnet',
'release_state': 'GA'
'features': ['duo_chat', 'code-generation']
},
{...}
]
Current roster for offered models
Code Completion:
- Codestral on Fireworks (default)
- Codestral on Vertex (GA)
- Qwen 2.5 7B on Fireworks (Beta)
Code Generation
- Claude 3.7 Sonnet
For reference.