Synthetic Prompt Eval Dataset Generator Feedback
Context
This issue is used to collect internal feedback for the Synthetic Prompt Eval Dataset Generator tool.
The tool uses an LLM (Claude) to automatically generate synthetic datasets for prompt evaluations. It enables teams to quickly generate an initial dataset for prompt evaluation, which can serve as a foundation for more refined testing approaches later. This helps bridge the gap between implementing prompt evaluation logic and having relevant datasets to test with.
How to get started
Follow these steps to use the Synthetic Prompt Eval Dataset Generator:
-
Make sure you have valid
ANTHROPIC_API_KEYandLANGCHAIN_API_KEYin your.envfile -
Install eval dependencies:
poetry install --with eval -
Run the
generate-datasetcommand with appropriate parameters:poetry run generate-dataset <prompt_id> <version> <dataset_name> --uploadWhere:
-
prompt_id: The ID of the AIGW prompt (e.g.,chat/explain_code) -
version: The version of the AIGW prompt -
dataset_name: Name for the output dataset (used as the filename to save the dataset locally and as the name of the dataset in Langsmith when the--uploadoption is used)
For example:
poetry run generate-dataset chat/explain_code 1.0.2 duo_chat.explain_code.2 --upload -
-
The tool will:
- Analyze the prompt definition to understand its purpose
- Create diverse input examples covering varied cases
- Generate expected outputs for each input
- Save the resulting dataset as a JSONL file
- Optionally upload the dataset to LangSmith (if
--uploadflag is used)
See the documentation for more information including CLI options.
How to leave feedback
- Please post a comment on this issue to leave your feedback
- Include as much information as possible, e.g., the command you used, the quality of the generated dataset, any issues encountered, etc.
- Screenshots of problems and examples of generated datasets are greatly appreciated!
- Share how you used the generated dataset in your evaluation process
- Positive feedback is also welcome
😸
Known limitations
- The
--uploadoption will show an error if a dataset with the same name already exists in LangSmith.- If you want to replace an existing dataset, delete it first.
- If you want to add examples to an existing dataset, you can download the old dataset as JSONL, combine it with the new generated examples, and then upload to LangSmith as a new dataset (see also the instructions in the datasets project
- The current implementation generates a maximum of
8,192tokens, which limits the number of examples that can be generated in a single execution. To generate larger datasets, execute the tool multiple times without the--uploadoption and then combine the resulting JSONL files and upload as a new dataset.