Enabling RAG on Customer Data via the AI Gateway

In order for customers seeking to customize their Duo features with RAG, we must enable semantic embedding via the AI Gateway while allowing users to keep their data local. This could take different forms, including:

host embedding models on the AI Gateway
enable customers to call to a self-hosted embedding models via the AI Gateway

Either of the above would allow users to keep their documentation/repos local, ensuring maximum data privacy. Customers who have limited or no security concerns would be able to leverage Vertex AI Search for embedding, storage, and retrieval.

The AI Gateway could also potentially host/configure to other aspects of the RAG process to include vector storage and retrieval/re-ranking. Vector storage could be on either on the AI Gateway or hosted on the user's GitLab instance. Depending on the retrieval/re-ranking approach, the AI Gateway could also be configured to enable self-hosted re-ranking model configuration.

Edited Jun 21, 2024 by Susie Bitters