ActiveContext: use indexing_embedding_versions to generate embeddings
What does this MR do and why?
Part 2 of 4 of #534318 (closed):
- !187724 (merged): add migration helper for updating collection metadata
-
This MR: use
indexing_embedding_versionsto generate embeddings during indexing -
!188549 (merged): use
search_embedding_versionduring searching - !188560 (merged): update docs to show how to use this workflow to manage embedding models
This MR:
- Adds
collection_classto metadata. This is needed to accesscollection_class::MODELSfrom a reference and during search.- This value is set whenever we have a migration to update metadata. IMO this isn't too bad since migrations aren't run frequently and this makes sure we have this value when using other metadata fields.
- Updates the embeddings preprocessor to:
- Loop through the MODELS versions to generate and store embeddings.
References
See https://gitlab.com/gitlab-org/gitlab/-/blob/144ec77cd48f682118dbb05417bf688a17d79017/gems/gitlab-active-context/doc/how_to.md for how the workflow will work.
How to set up and validate locally
- Run through steps in https://gitlab.com/gitlab-org/gitlab/-/blob/144ec77cd48f682118dbb05417bf688a17d79017/gems/gitlab-active-context/doc/how_to.md#set-embedding-model
- Enqueue and execute some docs and note that they use the embedding model specified (tail AIGW logs) and stores it on the correct field.
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #534318 (closed)
Edited by Madelein van Niekerk