Skip to content

Re-visit documentation about existing evaluation pipelines

Problem to solve

The CEF repository now contains all evaluation pipelines that were previously implemented in ELI5 or Prompt Library. However, the documentation about the eval pipelines is either missing, incomplete, or scattered across different locations within the CEF repository.

Proposal

Ensure that every evaluation pipeline has a dedicated documentation page. This documentation should include the following information:

1. Evaluation pipeline name
2. How to run it using the CEF CLI command
3. Link to the LangSmith dataset that can be used to run the pipeline, including an explanation of the dataset's structure so that other datasets with the same structure can be used
4. Description of the evaluation pipeline
5. Evaluators used in the evaluation pipeline
6. Which metrics are collected and how to interpret them

Here is the list of existing evaluation pipelines:

  • code-suggestions evaluate
  • duo-chat regression
  • duo-chat follow-up
  • duo-chat docs
  • duo-chat qa-resources
  • duo-chat context-use
  • duo-chat code-explain
  • duo-chat code-refactor
  • duo-chat code-test
  • duo-chat code-fix
  • duo-chat pairwise
  • ai-gateway evaluate
  • duo-code-review evaluate
  • duo-workflow swe
  • duo-workflow fix-broken-pipeline (will be deprecated)
  • root-cause-analysis eval
  • vulnerability-resolution eval

Please place the documentation under doc/eval_pipelines/<feature>/<doc.md>

Further details

This work may require creating documentation from scratch, updating existing documentation, or consolidating documentation in one location relevant to each feature.

Links / references

Documentation for dataset creation pipelines is covered by this issue: #672 (closed)

Some evaluation pipelines use generic evaluators. These generic evaluators will be documented in: #747 (closed)

Edited by Alexander Chueshev