Baseline DeepSeekCoder Models

For each variant of DeepSeekCode Base models, we will need to host it in the local GDK and run against the complete Code Suggestions datasets for Code Generation (MBPP, and code_generation_v2 (development)) and Code Completion (dataset_v2) to establish baselines for performance. Follow steps outlined in Local Model Baselining and Prompt Development f... (#468933)

Each Models performance for each dataset is documented in this issue's description/comments.
Push baseline results to BigQuery > dev-ai-research-0e2f897 > custom_models to enable dashboard view of the model/feature performance
identify models with strong performances on the task and generate issue for prompt creation/iteration on that task

for documentation questions to be considered for support the baseline model should reach:
- an average similarity score for code completion of at least .8 ; to understand the basis for this minimum similarity score threshold, reference foundational model performances for the task on the Code Completion Dashboard
- for code generation the score should be at least 3.7 on a scale of 1-4
  - to understand the basis for this minimum similarity score threshold, reference foundational model performances for the task on the Chat Dashboard

Edited Aug 08, 2024 by Susie Bitters