# π ββοΈ Move ELI5 to CI Run π
π Description
We need to implement a system to run ELI5 (Evaluate Like I'm 5) supported pipelines on a predefined schedule to help developers monitor the quality of AI features over time.
π Importance
-
π Provides product with a statistical overview of AI feature performance over time -
π Increases chances of catching errors if model provider experiences quality regression
π Difference from PL
-
π₯ οΈ PL currently doesn't provide a unique interface to connect eval pipelines to daily runs -
π Running two systems daily helps find evaluation mismatches (Example: Issue #52) -
π ELI5 enhances PL by building a generic interface following Evaluate Like I'm 5 principles -
π€ We don't compete with the PL rather we enhance it by building a generic interface. With the MV team, we agreed that we need to provide Langsmith (ELI5) as the entry point for the developers to code their evaluation pipelines (with the PL support)
π‘ Proposed Implementation
Run ELI5 as a scheduled CI pipeline:
-
π Developers register evaluation pipelines for daily run, specifying input dataset and additional parameters -
β° Daily at a set time, ELI5 starts as part of the scheduled pipeline: a.π Starts GDK instance with latest changes b.π οΈ Runs Rake task to collect required outputs c.π Runs registered eval pipelines, rendering results in the Langsmith UI