Skip to content

Adjust the mistral prompt for mixtral8x22b (MoE)

Mohamed Hamda requested to merge mixtral-22b-prompt into master

What does this MR do and why?

In this merge request, we adjust the Mistral prompt to be Mixtral8x22B (MoE) friendly and eliminate hallucinations.

During the testing of Mixtral8x22B, we encountered some hallucinations caused by the examples we provided.

The MoE was using these examples to generate responses.

For instance, we observed significant hallucinations where the output mentioned "Arkansas" for example: Screenshot_2024-06-03_at_14.26.06
This term was part of our input examples: Screenshot_2024-06-03_at_14.27.30

After removing these examples and modifying the prompt to act more as a code generation agent rather than a code completion one.

We achieved much clearer and more accurate results: Screenshot_2024-06-03_at_14.40.27

Prompt Evaluation Control (~0.91) and Test similarity score (~0.89) previously was ~0.87

  • Control : 0.91
  • Current: 0.88769758238512009, Variance: -0.2
  • Sample Size: 425
  • Success: Yes, for a mistral family compared to Antrhopic, that should be good.

On GCP

SELECT avg(similarity_score) FROM `dev-ai-research-0e2f8974.code_suggestion_experiments.mhamda_mixtral_22b_20240603_150423__similarity_score` LIMIT 1000

Screenshot_2024-06-03_at_15.39.36

Mistral Prompt Evaluation Control (~0.91) and Test similarity score (~0.86) previously was ~0.87

  • Control : 0.91
  • Before: 0.86543818249421944
  • Current: 0.85687403566696974, Variance: < -0.1
  • Sample Size: 425
  • Success: Yes, for a mistral family compared to Antrhopic, that should be good.

On GCP

SELECT avg(similarity_score) FROM `dev-ai-research-0e2f8974.code_suggestion_experiments.mhamda_mistral-2nd-run_20240603_154840__similarity_score` LIMIT 1000

Screenshot_2024-06-03_at_16.09.20

We can definitely iterate on the prompt, and we do have an issue with that, but the prompt works for both mistral and mixtral.

Edited by Mohamed Hamda

Merge request reports