RAG for Model Customization Notes
This document is intended to capture combined notes on RAG and how to best implement RAG for model customization.
Meeting Notes
Framework Notes -
-
we don't need to recreate the wheel, a lot of foundational exploration for RAG has been explored by Framework; everything in place for some PoCs, awaiting decision on embedding storage (PGV vs ES)
- Framework happy to collab with CM on how to build/consume the service; can carve CM into the workload
- There are some element of RAG(ish) at the platform (context injection in DuoChat); we don’t have the ability to precreate embedding representations of Gitlab.org; embed everything; dynamic updates; build pipelines> requires decision about embedding store, the storage, and what service to do the search of the store
- addition of embeddings API -
- vector storage - current debate of PGV vs ES
- PGV > bs we already have postgress discussion (around how do we get to self-managed)
- ES not as adopted with self-managed, PG is “already there”
- BUT for dotcom and 30-40% for self-managed (in terms of ARR) with ES already installed… (the biggest customers in SM have ES)
- What would RAG implementation look like at different service levels
- dot com:
- dedicated:
- would embedding stores be stored locally?
- embedding API calls going through centralized Framework?
- self managed:
- Self-managed implementation still a bugbear
- Self managed RAG and embeddings with elastic search doc
RAG Elements
indexing
approaches
- dense vector similarity search - Elastic search
- keyword
- BM25 (Best Matching 25)
tokenization considerations
- document
- function
- line
vector storage
user query
processing and tokenization
llm
Proposed Pipelines
- Using BM25 in Elasticsearch to find relevant code documents (no embeddings or semantic search): https://gitlab.com/shinya.maeda/code-generation-elasticsearch-bm25. Video: https://youtu.be/2Ub70Ow8yag?feature=shared
- BM25 (Best Matching 25) is a ranking function used in information retrieval to rank documents based on their relevance to a given search query. It is an extension of the TF-IDF (Term Frequency-Inverse Document Frequency) weighting scheme, which is widely used for text retrieval. (No embedding store, no semantic search.)
- Moving DUO chat embeddings from
pg_vector
to Elasticsearch: !145392 (closed). - Repository X-Ray RAG.
- Using
pg_vector
: !142912 (closed). Summary can be found here - Using Elasticsearch: !144715 (closed). Summary can be found here.
- Using
Other Proposals
- AI Framework should introduce an abstraction layer that works with any vector stores (pgVector => Elastic Search)
Validation Process
References
-
Figma from a workshop- https://www.figma.com/file/hsDkrLEghaTidehqDFvTy2/Embeddings-Workshop?type=whiteboard&node-id=0-1&t=X9im3MxQAO2YKJoK-0
-
Summary: Global Search & AI Framework RAG Workshop - Session 1 https://docs.google.com/document/d/19RDKyy5MgcwyX1jWbc2I0zxHstufIS2OFvKvbgry3rc/edit#heading=h.chc7ph9dx4ua
RAG eval- #443321 (comment 1791405038)
Edited by Susie Bitters