Quality of code completions is below acceptable levels when a model is pinned for code completion via model switching

After connecting code completions to use model switching, I noticed the following:

Quality of code completions is low or it provides no suggestions at all when pinning the model to Fireworks Codestral:

Example:

2025-06-07_12:31:18.97793 gitlab-ai-gateway       : 2025-06-07 14:31:18 [debug    ] code completion suggestion:    api_key=None correlation_id=01JX55ZEBZPPMKS0632SGGG7MD language=ruby score=100000 suggestion=

where no suggestion was generated even though the code completion request itself generated a 200 OK.

Language of the file isn't identified by the prompt properly

2025-06-07 14:31:17 [info     ] Performing LLM request         api_key=None correlation_id=01JX55ZEBZPPMKS0632SGGG7MD prompt='System: As a 9 code completion assistant that generates only 9 syntax, your responsibility is to fill in the precise missing 9 code that seamlessly bridges a provided \'prefix\' and \'suffix\'.  \n\n### Requirements:\n1. Complete the functionality with exact 9 code adhering to the given prefix and suffix.  \n2. Provide only the missing code snippet essential for functionality, strictly conforming to 9 syntax...

For some reason, it is identifying the language of the file as "9", while I was working on a Ruby file, which is strange.

When we use code completion via model switching, we seem to be missing out on post-processing steps that are being carried out today for code completions from this file - https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/83c3a84278d0609c918370b01cbc88368a25fa63/ai_gateway/code_suggestions/processing/post/completions.py. This is affecting the quality of the results.
Code completion models like Vertex and Fireworks have some custom specifications today: https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/main/ai_gateway/models/litellm.py#L126-139, but we miss out on these specifications and do not use them when the model is configured via model switching.
It appears that on gitlab.com today, when fireworks is used as the model, it does not use the codestral prompt at all. (How does it work then?). At least in the logs, no prompt details are logged. However, when Fireworks codestral is pinned, this prompt is used and that may be also a contributing factor to poor quality of the result.

UPDATE: Final Decison logged here

Edited Jun 11, 2025 by Manoj M J [OOO]