16.3: Scope of A/B Testing Platform for Prompt Engineering Experiments

Overview

This is for us to scope the requirements for an A/B testing platform focused on prompt engineering. This is purely for rapid experiments for AI Assisted, Something like optimizely for Prompt Engineering. The purpose of the platform is to have a control and test version tracked lean system to test and validate various hypothesis for prompt engineering before rolling it out in production

Functionality Requirements

Have isolation between experiments in order for experiment X to not interfere with experiment Y.
Define a success criteria based on a measurable metric (e.g.: acceptance rate by language, resource usage, request time, etc). We want to make data-driven decisions and not decide based on gut feeling. Metrics should come first. If there is an experiment which can't be measured, we should first develop a metric for it.
Allow developers to easily to manage experiments. One should have autonomy to manage the lifecycle of an experiment without requiring a new deployment. This includes start/stopping the experiment but also choosing a winner variant (A/B/.../N) to receive 100% of the traffic, once a decision is made.
Run statistical analysis in order to understand if a shift in the measured metric is significant enough or if it can be attributed to random variation (t-test comes to mind).
Experiment version control and dashboards

High-level Workflow

Build a small Experimentation engine in the Model Gateway to distribute requests through different experiments/variants.
Include the experiments data in the telemetry payload in the /v2/completions response
Store the experiments data in the client, e.g. VSCode
Send the experiments data in the subsequent /v2/completions request from the client
Record experiments using Prometheus labels and aggregate/visualise acceptance rate in Grafana.

Experiment MVC

With language suffix: gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!305 (merged)
Temperature of Code-Gecko: Experiment with a lower temperature with code-g... (gitlab-org/modelops/applied-ml/code-suggestions/ai-assist#229)

Resources

Edited Aug 21, 2023 by Tan Le