Add dataset to evaluate library instruction in code generation prompt
Summary
As part of gitlab-org/gitlab#466400 (closed), we need a dataset that can evaluate how well the LLMs utilize the libraries list (provided by Repository Xray) in generating the most applicable code. We will start with using a static list of Ruby/Rails libraries. The libraries part of the prompt will be the variable to determine if a different sentence structure or wording improves the generated code.
Proposal
- Come up with ruby and/or rails libraries to be added in the prompt
- Add another dataset for testing RAG code generation; this has to have an expected answer
Edited by Leaminn Ma