ActiveContext: use indexing_embedding_versions to generate embeddings

What does this MR do and why?

Part 2 of 4 of #534318 (closed):

  1. !187724 (merged): add migration helper for updating collection metadata
  2. This MR: use indexing_embedding_versions to generate embeddings during indexing
  3. !188549 (merged): use search_embedding_version during searching
  4. !188560 (merged): update docs to show how to use this workflow to manage embedding models

This MR:

  • Adds collection_class to metadata. This is needed to access collection_class::MODELS from a reference and during search.
    • This value is set whenever we have a migration to update metadata. IMO this isn't too bad since migrations aren't run frequently and this makes sure we have this value when using other metadata fields.
  • Updates the embeddings preprocessor to:
    • Loop through the MODELS versions to generate and store embeddings.

References

#534318 (closed)

See https://gitlab.com/gitlab-org/gitlab/-/blob/144ec77cd48f682118dbb05417bf688a17d79017/gems/gitlab-active-context/doc/how_to.md for how the workflow will work.

How to set up and validate locally

  1. Run through steps in https://gitlab.com/gitlab-org/gitlab/-/blob/144ec77cd48f682118dbb05417bf688a17d79017/gems/gitlab-active-context/doc/how_to.md#set-embedding-model
  2. Enqueue and execute some docs and note that they use the embedding model specified (tail AIGW logs) and stores it on the correct field.

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #534318 (closed)

Edited by Madelein van Niekerk

Merge request reports

Loading