Skip to content

System: RAG Validation

Support to RAG validation

Evaluation of RAG architectures is crucial for understanding the performance and effectiveness of our RAG systems. There are two distinct systems for evaluation: the search system and the question answering (QA) system. A comprehensive evaluation strategy should include both systems, assessing them individually and how well they integrate and complement each other in providing accurate, relevant answers to the user's queries.

Validation Aspects

Evaluating the Search / Context Injection Basis

Key metrics may include precision and recall at various levels (K). The evaluation of the search system will require insight into:

  • the context being returned and injected into the prompt
  • cosine similarity scores and re-ranker scores for that context
  • semantic similarity based search methods
  • other non-semantic methods
    • Zoekt for code
    • BM25
    • TF-IDF
    • Ctags
    • Xray libraries

Evaluating the Question / Prompt Handling

Evaluation of this system will likely mirror the method we have devised to test documents doc related questions for Duo Chat.

Dataset Creation

One method for dataset creation might be selecting random chunks in our documentation and prompting an LLM to generate questions that those chunks could answer. Logging which chunks are expected to be relevant to which prompts might help us test the ability of the system to identify and retrieve the chunks semantically tied to the generated questions.

Features Support

Validation of RAG implementation strategies has potential impacts across numerous features, to include:

  • Global Search
  • Duo Chat
  • Custom Models / Model Personalization
  • Code Suggestions
  • Explain this Vulnerability

Feature Development Support

AI Model Validation could provide support in the feature development stage, to include:

  • identifying appropriate prompt library datasets
  • identifying ideal chunking/tokenization spans
  • metrics for comparison of outputs (including LLM judge to augment limited human labeling)

Links / references

Blueprints

Edited by Susie Bitters