Enable Private Cloud-hosted Models as Self-Managed Custom Models

Overview

Enable customers want to use a the same model (Vertex or Anthropic) used by the .com feature with a model deployed in a enterprise's cloud space at GCP, Azure or AWS.

This would require adding configuration for self-hosted models that picks up the .com prompts and points to the same type of model in a private cluster.

PoC

Basically, the functionality that we want to support here can be done even now with litellm-proxy as long as we support the prompt for the specified model. For example:

  1. Deploy the model in Model Garden: https://cloud.google.com/model-garden?hl=en. For example, Codestral
  2. Configure litellm proxy, something like:
model_list:
  - model_name: codestral
    litellm_params:
      model: vertex_ai/codestral@2405
      vertex_ai_project: idrozdov-caf8e304
      vertex_ai_location: us-central1
litellm --config config.yaml --detailed_debug

Now, when we request a codestral model via proxy as:

curl -X POST http://localhost:4000/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "codestral", "messages": [{ "role": "user", "content": "Hello!" } ] }'

We receive a response.

  1. Configure a self-hosted model as:

Screenshot_2024-09-11_at_14.10.57

  1. Now a feature is powered by a Cloud-hosted model

Limitations

  • We want to reuse the Anthropic prompts that we already have to support Anthropic-Vertex
  • We don't want a customer to set up Litellm proxy

Proposal

After trying to configure the proxy, it seems that specifying the provider is not enough, we also need to know what name must be specified to identify a particular model. For example, we allow setting Codestral as a model and we send codestral term as a model name; however, vertex-ai expects codestral@2405. So we either need to resolve this naming internally (for example, we know that for vertex_ai/codestral, we need to send vertex_ai/codestral@2405) or we can allow a customer to specify all this information by themselves. We could either have two fields (Provider for vertex_ai and Served model name for codestral@2405), but maybe we can have a single field and ask a customer to specify it in this format: https://docs.litellm.ai/docs/providers/vertex#vertex_ai-route.

Screenshot_2024-09-11_at_14.15.21

That would also solve the problem that was once mentioned in Slack: a customer wants to specify the served model name because their server expects a name different from the one that we're sending.

Edited by Igor Drozdov