Add dataset to evaluate library instruction in code generation prompt

Summary

As part of gitlab-org/gitlab#466400 (closed), we need a dataset that can evaluate how well the LLMs utilize the libraries list (provided by Repository Xray) in generating the most applicable code. We will start with using a static list of Ruby/Rails libraries. The libraries part of the prompt will be the variable to determine if a different sentence structure or wording improves the generated code.

Proposal

Come up with ruby and/or rails libraries to be added in the prompt
Add another dataset for testing RAG code generation; this has to have an expected answer

Edited Jul 09, 2024 by Leaminn Ma