V2 Generation/Completion Data Collection (!29) · Merge requests · GitLab.org / ModelOps / Code Suggestions Model Evaluation / model-evaluator

This MR merges the data collected from generation/completion prompts using the evaluation script.

Some quick facts that might help a v3 data collection process (if there is ever a need for one):

the total run time for generation took about 5 1/2 hours start to finish (this isn't exact because I lost wifi with 10 prompts to go, but it is fairly accurate)
vertex_gecko is yet to return results still (that should be fixed by the end of the week)
the program hung on completion prompt number 15 for Javascript. The wording of the prompt seemed to be the issue as slightly changing the prompt resulted in a response from the models. This is the only occurrence I have seen of this thus far.
the amount of sleep() between each API call might be overkill and a lower number has not been tested
the total number of API calls is (25 per language x 4 languages x 6 models (no longer including gitlab_model x 10 runs for each prompt) 6,000 calls

Edited Jul 17, 2023 by Dylan Bernardi

V2 Generation/Completion Data Collection