Create Code Suggestions Evaluations for Additional Context

Objective

Set up our ELI5 (Explain Like I'm 5) evaluation framework to be capable of testing different types of additional context for code suggestions, preparing for future data-driven decisions to improve suggestion quality.

Description

We need to enhance our ELI5 evaluation framework to be ready to assess the impact of various types of additional context on code suggestion quality. This setup will enable future evaluations to identify which contextual information provides the highest improvement in suggestion accuracy and relevance.

Key goals:

  1. Modify ELI5 to accommodate different context types
  2. Develop a standardized evaluation process for comparing context types
  3. Implement metrics to quantify suggestion quality improvements
  4. Prepare the framework for future analysis of different context types

Tasks

  1. Identify and list potential types of additional context to evaluate in the future (e.g., x-ray, open tabs, etc.)
  2. Update the ELI5 framework to handle different context types
  3. Create a diverse set of test cases covering various coding scenarios
  4. Define and implement quantifiable metrics for measuring suggestion quality
  5. Set up the infrastructure to execute evaluations for each context type
  6. Document the process for running evaluations and collecting results
  7. Create templates for result analysis and reporting

Acceptance Criteria

  • ELI5 framework is successfully set up to support evaluation of different context types
  • A comprehensive set of test cases is developed and documented
  • Quantifiable metrics for measuring suggestion quality are implemented
  • Documentation for running evaluations and collecting results is complete
  • Templates for result analysis and reporting are created
Edited by 🤖 GitLab Bot 🤖