Build an Initial Code Suggestions LangSmith Dataset
Goal
As part of the larger prompt testing effort here Prompt and AI Feature Evaluation setup and work... (&13952), we would like to build an initial dataset for testing Code Suggestions, both code completion and code generation. Having a shared dataset and evaluators for Code Suggestions will enable us to iterate on models and prompts faster.
This issue is intended to just get the initial dataset set up, not to build a comprehensive testing suite. When we have the initial data, we can then decide how to best iterate and build on the dataset in the future.
About LangSmith
LangSmith is a platform for building production-grade LLM applications. It allows you to closely monitor and evaluate your application, so you can ship quickly and with confidence.
We have created documentation for using LangSmith at GitLab here: https://handbook.gitlab.com/handbook/engineering/development/data-science/ai-powered/ai-framework/evaluation/