Gather resources and make a pathway to uplevel teams

Prompt engineering, dataset curation, and prompt iteration materials for Experimentation

How does this step differ from the usual software development lifecycle?
Where do I experiment with these improvements?
When should I do manual evaluation, automatic evaluation, or production evaluation? What is the difference?
What should I try to first in order to be more efficient with this step?
Who can iterate on the prompt?
How do I understand how the prompt tested in Langsmith will have an impact on the feature via CEF?
What can be done with CEF and what can be done with Langsmith? How to use both?

Here's the markdown table with the requested structure:

Subject	DRI	Iteration
1.1 🗣️ Prompting	-	-
Understanding prompt engineering	@mksionek	1
Best practices for writing effective prompts	@mksionek	1
Prompt templates and their importance
Handling context and memory in prompts
1.2 🛠️ Tuning and optimizing workflows for Prompts	-	-
Rudimentary data for Prompt Tuning with Langsmith
Prompt tuning for LLMs using Langsmith and Anthropic Workbench together + CEF	Prompt tuning for LLMs using Langsmith and Anthropic Workbench together part - @mksionek With CEF together part - @tle_gitlab , @HongtaoYang	1 and 2
Optimizing inference speed and resource usage (AIF)
Implementing caching strategies (AIF)
Load balancing and scaling considerations (AIF)
1.3 📊 Evaluation & Monitoring	-	-
Building Datasets for Eval
Spot checking and metric iteration
Baselining foundational models in the CEF
Using CEF dashboard and troubleshooting
Using automated evaluation pipelines for CEF
Continuous monitoring and applying as guidance for Prompt Tuning	@HongtaoYang , @tle_gitlab	2
A/B testing strategies for Gen AI features	@eduardobonet	2
Using the Experimentation Framework in the CEF

Edited Sep 19, 2024 by Stephan Rayner