Improve Codestral code completion performance and quality
Description
We've identified several opportunities to improve the Codestral code completion feature based on initial testing and feedback. This issue tracks the key areas for investigation and improvement:
- Bracket closing behavior(DRI: TBA)
- Investigate and fix the issue where Codestral tries to close every bracket, sending multiple brackets after suggestions
- Determine if this is related to VSCode's auto-closing behavior and test with that option disabled
- Stop sequences and output length(DRI: TBA)
- Implement the same newline stop sequence (\n\n) used for Code Gecko to limit output length
- Evaluate if this improves performance without sacrificing quality
- Consider adjusting max output tokens (currently 128) if needed
- Latency investigation(DRI: TBA)
- Further investigate latency differences between:
- Direct Vertex API calls
- Calls routed through AI Gateway
- Different regions (especially APAC)
- Identify bottlenecks and optimize where possible
- Further investigate latency differences between:
- Post-processing in AI Gateway(DRI: TBA)
- Implement post-processing of completions in the AI Gateway, similar to other models
- Ensure consistency across different IDEs and environments
- Web IDE integration(DRI: TBA)
- Investigate why Web IDE may not be using Codestral and instead falling back to Code Gecko
- Determine if Web IDE needs updating to use Language Server for proper Codestral integration