Skip to content

# πŸƒβ€β™‚οΈ Move ELI5 to CI Run πŸš€

πŸ“ Description

We need to implement a system to run ELI5 (Evaluate Like I'm 5) supported pipelines on a predefined schedule to help developers monitor the quality of AI features over time.

🌟 Importance

  1. πŸ“Š Provides product with a statistical overview of AI feature performance over time
  2. πŸ› Increases chances of catching errors if model provider experiences quality regression

πŸ”„ Difference from PL

  1. πŸ–₯️ PL currently doesn't provide a unique interface to connect eval pipelines to daily runs
  2. πŸ” Running two systems daily helps find evaluation mismatches (Example: Issue #52)
  3. πŸš€ ELI5 enhances PL by building a generic interface following Evaluate Like I'm 5 principles
  4. 🀝 We don't compete with the PL rather we enhance it by building a generic interface. With the MV team, we agreed that we need to provide Langsmith (ELI5) as the entry point for the developers to code their evaluation pipelines (with the PL support)

πŸ’‘ Proposed Implementation

Run ELI5 as a scheduled CI pipeline:

  1. πŸ“ Developers register evaluation pipelines for daily run, specifying input dataset and additional parameters
  2. ⏰ Daily at a set time, ELI5 starts as part of the scheduled pipeline: a. πŸš€ Starts GDK instance with latest changes b. πŸ› οΈ Runs Rake task to collect required outputs c. πŸ“Š Runs registered eval pipelines, rendering results in the Langsmith UI