Gather resources and make a pathway to uplevel teams
Prompt engineering, dataset curation, and prompt iteration materials for Experimentation
✅ Create a new prompt to validate your idea
Prompt Engineering Resources:
- High level guidelines to consider
- Validating an idea without writing code
- Validating an idea with Anthropic workbench
Data Engineering Resources
- https://cloud.google.com/learn/training/data-engineering-and-analytics
- https://www.cloudskillsboost.google/paths/16
- https://cloud.google.com/learn/certification/data-engineer
Tools you can do it in
⚠ Dataset Guide
- Why do I need this at this stage?
- How much data should I include?
- What is an output, ground truth, or expected answer? Should I include one? How do I determine what it is?
- Basic Data/ Schema Guidance
- How to approach data for Evaluation Purposes?
Existing resources:
- Purpose, Structure, and Examples
- Inputs, and Uploading to LangSmith (no ground truth or output)
- Modifying existing CEF datasets
- Consider interviewing Lesley who just did her own dataset out of nowhere for Chat MRs
Other infos:
- List of all existing datasets today used in CEF
⚠ Iteratively make your prompt better
- How does this step differ from the usual software development lifecycle?
- Where do I experiment with these improvements?
- When should I do manual evaluation, automatic evaluation, or production evaluation? What is the difference?
- What should I try to first in order to be more efficient with this step?
- Who can iterate on the prompt?
- How do I understand how the prompt tested in Langsmith will have an impact on the feature via CEF?
- What can be done with CEF and what can be done with Langsmith? How to use both?
Existing resources:
LangSmith
- How to iterate on your prompt without a ground truth or expected output from your dataset (manual evaluation)
- How to use an automatic evaluator during experimentation
- LangSmith prompt evals using Playground
- AI Framework: Uploading a Dataset within LangSmith
- VR Feature LangSmith Prompt Iteration E2E
- VR new prompt based on the current Explain Vulnerability prompt in production
- RCA Feature LangSmith Prompt Iteration E2E
ELI5
- ELI5 Instruction video
- Step-by-Step Guide for Conducting Evaluations using LangSmith at GitLab - ELI5 Evals
- Running Evaluations
- Dataset Creation Guidelines for GitLab AI Features
- Instructions to Upload and Evaluate Prompt Templates in LangSmith Playground
- ELI5 to record latency
CEF / PromptLib
- Prompt library local eval
- Rake task + local prompt library - Dup Chat
- A/B Experimentation Guide
- How to Docs for various eval pipelines
- Data for Diagnostic Testing
- Video's how to guides
- Video guided on using CEF dashboard for Feature Analytics
- Eval Prompt Templates(https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library/-/tree/main/data/prompts?ref_type=heads)
Anthropic console
- Easy AI Experimentation. Quickly validating AI experiment idea with Anthropic console. Part 1
- Easy AI Experimentation. Quickly validating AI experiment idea with Anthropic console. Part 2
- Easy AI Experimentation. Part 1. Experiment with consoles
- Easy AI Experimentation. Part 2. AI Experimentation project - Storage Setup
- Easy AI Experimentation. Part 3. AI Experimentation project - Tool
Other
- Using Python Notebooks for Evaluating Duo Chat Prompts
- Prompt Engineering in GitLab Rails
- H0w t0 Pr0mpt - presentation for custom models - video
- How to prompt - slides
- Building an AI Agent Evaluation Framework: Workflow Service Integration and Testing Strategies
Tools you can do it in:
Links
- ~~ doc: add prompt playbook outline (gitlab-org/gitlab!160849 - closed) • David O'Regan, Mon Ray+ • 17.3\~\~ Closed in favour of a AI Feature Development Playbook we will develop between the two teams, see gitlab-org/gitlab!160849 (comment 2020214334)
- https://docs.google.com/document/d/1TRzJHrDDfzRroMG3sKhmMh0IH1lp2_AyhBgkurYSFh0/edit#heading=h.f3zb5q4vh2jr
Work division
Here's the markdown table with the requested structure:
| Subject | DRI | Iteration |
|---|---|---|
|
1.1 |
- |
- |
| Understanding prompt engineering | 1 | |
| Best practices for writing effective prompts | 1 | |
| Prompt templates and their importance | ||
| Handling context and memory in prompts | ||
|
1.2 |
- |
- |
| Rudimentary data for Prompt Tuning with Langsmith | ||
| Prompt tuning for LLMs using Langsmith and Anthropic Workbench together + CEF |
Prompt tuning for LLMs using Langsmith and Anthropic Workbench together part - @mksionek With CEF together part - @tle_gitlab , @HongtaoYang |
1 and 2 |
| Optimizing inference speed and resource usage (AIF) | ||
| Implementing caching strategies (AIF) | ||
| Load balancing and scaling considerations (AIF) | ||
|
1.3 |
- |
- |
| Building Datasets for Eval | ||
| Spot checking and metric iteration | ||
| Baselining foundational models in the CEF | ||
| Using CEF dashboard and troubleshooting | ||
| Using automated evaluation pipelines for CEF | ||
| Continuous monitoring and applying as guidance for Prompt Tuning | 2 | |
| A/B testing strategies for Gen AI features | 2 | |
| Using the Experimentation Framework in the CEF |
Edited by Stephan Rayner