Evaluation for Model Updates

Overview

This is to document the perplexity of our Model Update.

LOWER IS BETTER! Learn about Perplexity

Perplexity Update for 8th May, model id: `16b_v5`

Model step: 4.2 K Steps. We can see

Slightly worse on go, java, javascript, python
Better on c, c#, c++, php, ruby, rust, scala, typescript

Next Steps: We continue to further train with a checkpoint later in the week. Our hypothesis is further validated through the steps of using bfloat16

Perplexity Update for 1st May, model id: `16b_v4`

Model step: 1.5 K Steps. We can see the performance is better for rust and scala but the performance is worse for the rest of languages.

Next Steps: We continue to further train with a checkpoint tomorrow. Continue working on hypothesis bug.

Edited May 17, 2023 by Taylor McCaslin

Evaluation for Model Updates

Overview

LOWER IS BETTER! Learn about Perplexity

Perplexity Update for 8th May, model id: 16b_v5

Perplexity Update for 1st May, model id: 16b_v4

Perplexity Update for 8th May, model id: `16b_v5`

Perplexity Update for 1st May, model id: `16b_v4`