RAG for Model Customization Notes

This document is intended to capture combined notes on RAG and how to best implement RAG for model customization.

Meeting Notes

Framework Notes -

we don't need to recreate the wheel, a lot of foundational exploration for RAG has been explored by Framework; everything in place for some PoCs, awaiting decision on embedding storage (PGV vs ES)
- Framework happy to collab with CM on how to build/consume the service; can carve CM into the workload
There are some element of RAG(ish) at the platform (context injection in DuoChat); we don’t have the ability to precreate embedding representations of Gitlab.org; embed everything; dynamic updates; build pipelines> requires decision about embedding store, the storage, and what service to do the search of the store
addition of embeddings API -
vector storage - current debate of PGV vs ES
- PGV > bs we already have postgress discussion (around how do we get to self-managed)
- ES not as adopted with self-managed, PG is “already there”
  - BUT for dotcom and 30-40% for self-managed (in terms of ARR) with ES already installed… (the biggest customers in SM have ES)
What would RAG implementation look like at different service levels
- dot com:
- dedicated:
  - would embedding stores be stored locally?
  - embedding API calls going through centralized Framework?
- self managed:
  - Self-managed implementation still a bugbear
  - Self managed RAG and embeddings with elastic search doc

approaches

tokenization considerations

processing and tokenization

Using BM25 in Elasticsearch to find relevant code documents (no embeddings or semantic search): https://gitlab.com/shinya.maeda/code-generation-elasticsearch-bm25. Video: https://youtu.be/2Ub70Ow8yag?feature=shared
- BM25 (Best Matching 25) is a ranking function used in information retrieval to rank documents based on their relevance to a given search query. It is an extension of the TF-IDF (Term Frequency-Inverse Document Frequency) weighting scheme, which is widely used for text retrieval. (No embedding store, no semantic search.)
Moving DUO chat embeddings from pg_vector to Elasticsearch: !145392 (closed).
Repository X-Ray RAG.
- Using pg_vector: !142912 (closed). Summary can be found here
- Using Elasticsearch: !144715 (closed). Summary can be found here.

Other Proposals

AI Framework should introduce an abstraction layer that works with any vector stores (pgVector => Elastic Search)

Edited Mar 01, 2024 by Susie Bitters