V2 Generation/Completion Data Collection
This MR merges the data collected from generation/completion prompts using the evaluation script.
Some quick facts that might help a v3 data collection process (if there is ever a need for one):
- the total run time for generation took about 5 1/2 hours start to finish (this isn't exact because I lost wifi with 10 prompts to go, but it is fairly accurate)
-
vertex_gecko
is yet to return results still (that should be fixed by the end of the week) - the program hung on completion prompt number 15 for Javascript. The wording of the prompt seemed to be the issue as slightly changing the prompt resulted in a response from the models. This is the only occurrence I have seen of this thus far.
- the amount of
sleep()
between each API call might be overkill and a lower number has not been tested - the total number of API calls is (25 per language x 4 languages x 6 models (no longer including
gitlab_model
x 10 runs for each prompt) 6,000 calls
Edited by Dylan Bernardi