Skip to content

Support for multiple embedding models

Proposal

We have a need to configure and roll out embeddings with differing dimensions. This will be useful in the following cases:

  • Air-gapped customers who want to run their own embedding model
  • Experimenting with and changing to a new model

The challenge is that the stored embeddings and request embeddings (e.g. from a question) must use the same model.

We need some way to keep track of the model in use - this must be used by the AI Gateway to determine which model to use to generate embeddings and it must be set by rails. Maybe it's a database table:

Table name: EmbeddingVersion

id
type: code|text
dimensions
name
model
status: in_progress|active|deprecated
  • Embedding requests to the AIGW takes EmbeddingVersion.active.for_type(type).model as a param.
  • kNN searches query the "embedding_#{EmbeddingVersion.active.for_type(type).id}" field.
  • The index's mapping should be called "embedding_#{EmbeddingVersion.active.for_type(type).id}" and its dimension should be "embedding_#{EmbeddingVersion.active.for_type(type).dimensions}"

We might need a UI component in which an administrator can set settings for the models used for embeddings on an instance. Once we know the model and its settings, we can create the mappings and start the embedding indexing process.

Screenshot_2024-07-11_at_11.39.39

The idea is that this kicks off a process:

  1. Create new EmbeddingVersion record with status in_progress
  2. Update mapping in ES by adding a new field with record.dimensions dims called "embedding_#{record.id}"
  3. Kick off process similar to Backfill embeddings for gitlab project (#456918 - closed) to generate embeddings with new model. The AIGW request should take record.model as a param.
  4. Once the backfill is complete, mark record as active and previous active record as deprecated.
  5. Kick off process to remove deprecated embedding field from Elasticsearch.

Out of scope: supporting different embedding models per namespace. We will assume instance-wide settings.

Edited by Madelein van Niekerk