Document how to evaluate prompt improvements and A/B testing

Problem

groupai framework and groupduo chat teams started using Prompt Library (provided by groupai model validation) for evaluating the performance scores of prompt improvement MRs and A/B testing.

Example:

But these process are not documented yet, so engineers have no way to use them.

Proposal

Document how to evaluate prompt improvements and A/B testing with Prompt Library.

Related to

Edited by Bruno Cardoso