Iteration plan: RAG- Ben Venker, Pini Wietchner

Iteration plan: RAG

Pre-read

Goals

Increase feature quality
- Create a RAG system capable of advanced retrieval
Enable a comprehensive AI powered capabilities across the DevSecOps

GTBD

As user of Gitlab I want to be able to ask questions about GL documentation
As user of Gitlab I want to be able to ask questions about GL Issues and Epics
As GitLab product development team I want to be able to use RAG to enrich the context of a feature I am building

Iteration 1

Feature	PG work	ES work
Shipping with self-managed	Bundle a separate embedding database with the pg_vector add-on along with the main postgres db. Provide a way for instances to invoke LLMs in use Documentation and support for SM	Bundle Elastic with Omnibus, et. al Leverage cloud-on-k8s repo for k8s users Provide a way for instances to invoke LLMs in use Update existing docs for self-managed support
Privacy and access controls	Replicate existing permissions in main db to PGVector database	Leverage existing permissions and access controls in advanced search
Keyword search	Keyword retrievers for all required document types Keyword result ranking Get to BM25-level relevance/ranking/precision	Get for free when data is stored in Elastic
Vector search	Retrievers for all required embedded document types	Embed required documents into Elastic
Hybrid search	Combine result sets into one and normalize rankings	Leverage native hybrid search query endpoint so that a kNN query is combined with the current keyword query and use native reciprocal rank fusion to rerank and get the most relevant results Every Elastic cluster needs to be on version 8.12+ and have a license.
Metadata filters	Ingest pipeline for all required filterable/sortable fields	Get for free when vectors are stored in Elastic
Experimentation & evaluation work	Analysis on the amount of resources required to store embeddings Chunk size relevance testing and validation Keyword search ranking experimentation Hybrid search rank testing and validation Was this result accurate? Did I expect to see it in this position? Response time Precision and other relevance metrics	Analysis on the amount of resources required to store embeddings Chunk size relevance testing and validation Hybrid search rank tuning Was this result accurate? Did I expect to see it in this position? Response time Precision and other relevance metrics

Iteration 2

Feature	PG	ES
AI Reranking	Send hybrid search results to reranker Work to host reranking model/AI Gateway	Send hybrid search results to reranker Leverage native model hosting? OR: Work to host reranking model/AI Gateway
Recursive retrieval	Determine the best way to chunk and store data in postgres * Implement recursive retrieval	Leverage native ability to store nested chunks Implement recursive retrieval
Small-to-Big retrieval	Determine the best way to chunk and store data in postgres Implement recursive retrieval	Leverage native ability to store nested chunks
Embedded Tables	Table ingestion and embedding pipeline	Table ingestion and embedding pipeline
Experimentation & evaluation work	Compare candidate reranking models for best results Test and validate response relevance and accuracy for recursive retrieval Test and validate response relevance and accuracy for small-to-big retrieval	Compare candidate reranking models for best results Test and validate response relevance and accuracy for recursive retrieval Test and validate response relevance and accuracy for small-to-big retrieval

Iteration 3

Feature	PG	ES
Routing	N/A	N/A
Query planning	NER and filter extraction	NER and filter extraction
Multi-document agent	Parsing, splitting, embedding,and storage mechanisms Store in memory for current interaction and then persist optionally?	Parsing, splitting, embedding,and storage mechanisms Store in memory for current interaction and then persist optionally?
Experimentation & evaluation work	Routing accuracy (did chat choose the right tool?) Planned query accuracy	Routing accuracy (did chat choose the right tool?) Planned query accuracy

Edited Feb 08, 2024 by Ben Venker