Iteration plan: RAG- Ben Venker, Pini Wietchner

Iteration plan: RAG

Pre-read 

Goals 

  • Increase feature quality
    • Create a RAG system capable of advanced retrieval 
  • Enable a comprehensive AI powered capabilities across the DevSecOps 

GTBD

  • As user of Gitlab I want to be able to ask questions about GL documentation
  • As user of Gitlab I want to be able to ask questions about GL Issues and Epics
  • As GitLab product development team I want to be able to use RAG to enrich the context of a feature I am building 

Iteration 1

Feature

PG work

ES work

Shipping with self-managed
  • Bundle a separate embedding database with the pg_vector add-on along with the main postgres db.
  • Provide a way for instances to invoke LLMs in use
  • Documentation and support for SM
  • Bundle Elastic with Omnibus, et. al
  • Leverage cloud-on-k8s repo for k8s users
  • Provide a way for instances to invoke LLMs in use
  • Update existing docs for self-managed support
Privacy and access controls
  • Replicate existing permissions in main db to PGVector database
  • Leverage existing permissions and access controls in advanced search
Keyword search
  • Keyword retrievers for all required document types
  • Keyword result ranking
  • Get to BM25-level relevance/ranking/precision
  • Get for free when data is stored in Elastic
Vector search
  • Retrievers for all required embedded document types
  • Embed required documents into Elastic
Hybrid search
  • Combine result sets into one and normalize rankings
  • Leverage native hybrid search query endpoint so that a kNN query is combined with the current keyword query and use native reciprocal rank fusion to rerank and get the most relevant results
  • Every Elastic cluster needs to be on version 8.12+ and have a license.
Metadata filters 
  • Ingest pipeline for all required filterable/sortable fields
  • Get for free when vectors are stored in Elastic
Experimentation & evaluation work 
  • Analysis on the amount of resources required to store embeddings
  • Chunk size relevance testing and validation
  • Keyword search ranking experimentation
  • Hybrid search rank testing and validation
  • Was this result accurate? Did I expect to see it in this position?
  • Response time
  • Precision and other relevance metrics
  • Analysis on the amount of resources required to store embeddings
  • Chunk size relevance testing and validation
  • Hybrid search rank tuning
  • Was this result accurate? Did I expect to see it in this position?
  • Response time
  • Precision and other relevance metrics

Iteration 2

Feature

PG

ES

AI Reranking
  • Send hybrid search results to reranker
  • Work to host reranking model/AI Gateway
  • Send hybrid search results to reranker
  • Leverage native model hosting? OR:
  • Work to host reranking model/AI Gateway
Recursive retrieval
  • Determine the best way to chunk and store data in postgres * Implement recursive retrieval
  • Leverage native ability to store nested chunks
  • Implement recursive retrieval
Small-to-Big retrieval 
  • Determine the best way to chunk and store data in postgres
  • Implement recursive retrieval
  • Leverage native ability to store nested chunks
Embedded Tables
  • Table ingestion and embedding pipeline
  • Table ingestion and embedding pipeline
Experimentation & evaluation work 
  • Compare candidate reranking models for best results 
  • Test and validate response relevance and accuracy for recursive retrieval
  • Test and validate response relevance and accuracy for small-to-big retrieval
  • Compare candidate reranking models for best results 
  • Test and validate response relevance and accuracy for recursive retrieval
  • Test and validate response relevance and accuracy for small-to-big retrieval

Iteration 3

Feature

PG

ES

Routing 
  • N/A 
  • N/A
Query planning 
  • NER and filter extraction
  • NER and filter extraction
Multi-document agent
  • Parsing, splitting, embedding,and storage mechanisms
  • Store in memory for current interaction and then persist optionally?
  • Parsing, splitting, embedding,and storage mechanisms
  • Store in memory for current interaction and then persist optionally?
Experimentation & evaluation work 
  • Routing accuracy (did chat choose the right tool?)
  • Planned query accuracy
  • Routing accuracy (did chat choose the right tool?)
  • Planned query accuracy
Edited by Ben Venker