Collect Code Creation Baseline Evaluation Metrics

Objective

Run evaluations to establish baseline metrics for code completion and code generation capabilities, focusing on latency and quality. These baseline metrics will serve as a benchmark for future model testing and improvements.

Tasks

  1. Set up the evaluation environment for code completion and code generation.
  2. Run a comprehensive set of tests covering various programming languages and scenarios.
  3. Collect and analyze the data for both code completion and code generation:
    • Calculate latency
    • Assess quality using predefined criteria
  4. Document the results, including:
    • Test setup and methodology
    • Raw data collected
    • Analyzed results with clear metrics for latency and quality
    • Any observations or patterns noticed during testing

Deliverable

Create a detailed report in the internal handbook documenting:

  1. The evaluation process
  2. Baseline metrics for both code completion and code generation
  3. Analysis of the results
  4. Recommendations for future improvements or areas of focus

Impact

This baseline evaluation will provide crucial data for:

  • Measuring the effectiveness of future model improvements
  • Identifying areas that require optimization
  • Setting performance targets for upcoming releases
Edited by 🤖 GitLab Bot 🤖