Create an endpoint in AI Gateway that accepts payload alongside the desired model

Why are we doing this work

In order to make individual AI features available for Self-managed customers, we need to make the features use AI GW and stop calling 3rd-party LLM providers directly.

Implementation plan

Objective: Create a unified AI Gateway endpoint that can accept payloads containing prompts and selected models, preprocess the data for the chosen model, send it to the corresponding model, and return the response in a standardized format.

This endpoint can be similar to the endpoint used by chat currently. The difference is that this endpoint should accept many models with different interfaces.

Requirements:

The endpoint should accept a payload containing the prompt text, the selected model identifier, and any additional parameters required by the specific model.
The AI Gateway should be able to preprocess the payload data into the format expected by the chosen model.
The preprocessed data should be sent to the corresponding model for processing.
The response from the model should be post-processed (if necessary) into a standardized format.
The standardized response should be returned to the client.
Models that we should include: Anthropic 3.0, Anthropic 2.*, Vertex: Codechat-Bison, Text-Bison, Code-Bison.

Implementation Steps:

Implement the Endpoint:
- Create a new endpoint in the AI Gateway that accepts POST requests with the payload.
- Validate the incoming payload against the defined schema.
Preprocess Payload:
- Implement a preprocessor module or function that handles the preprocessing of the payload based on the selected model.
- This module should have a mapping of model identifiers to specific preprocessing functions.
- The preprocessing function should convert the payload data into the format expected by the corresponding model.
Send Data to Model:
- Implement a model communication module or function that sends the preprocessed data to the appropriate model.
- Send the preprocessed data to the model and receive the response.
Post-process Response:
- Implement a post-processing module or function that converts the model response into a standardized format.
- This module should have a mapping of model identifiers to specific post-processing functions.
- The post-processing function should convert the model response into a common format that can be returned to the client.
Return Standardized Response:
- Return the post-processed, standardized response to the client.

This implementation proposal outlines the high-level steps and considerations for creating a unified AI Gateway endpoint that can support multiple AI models. The specific implementation should be aligned with AI GW blueprint.

Edited Apr 18, 2024 by Gosia Ksionek