Skip to content

V2 Generation/Completion Data Collection

Dylan Bernardi requested to merge v2-generate-generation-completion-results into main

This MR merges the data collected from generation/completion prompts using the evaluation script.

Some quick facts that might help a v3 data collection process (if there is ever a need for one):

  • the total run time for generation took about 5 1/2 hours start to finish (this isn't exact because I lost wifi with 10 prompts to go, but it is fairly accurate)
  • vertex_gecko is yet to return results still (that should be fixed by the end of the week)
  • the program hung on completion prompt number 15 for Javascript. The wording of the prompt seemed to be the issue as slightly changing the prompt resulted in a response from the models. This is the only occurrence I have seen of this thus far.
  • the amount of sleep() between each API call might be overkill and a lower number has not been tested
  • the total number of API calls is (25 per language x 4 languages x 6 models (no longer including gitlab_model x 10 runs for each prompt) 6,000 calls

cc @jayswain @allison.browne @srayner @andrei.zubov

Edited by Dylan Bernardi

Merge request reports