[VS Code] - Improve the Speed of Code Suggestions Generation Streaming
Checklist
-
I'm using the latest version of the extension (see the latest version in the right column of this page) - Extension version: 4.9.0
-
I'm using the latest VS Code version (find the latest version here) - VS Code version: latest
-
I'm using a supported version of GitLab (see README for the supported version) - GitLab version: 17.X Here's a possible fill for the issue template:
Summary
Improved streaming speed for code generation by inspecting how we process the language server stream.
Steps to reproduce
- Trigger inline completion in a code file
- Have the language server use a fast model like
claude-3-haiku-20240307
to simulate rapid completion responses - Observe the completion queue growing much faster than the editor can process and display completions
What is the current bug behavior?
When the language model generates completions faster than the VS Code extension can process and display them, the completion queue grows very large. As a result, the 50ms VSCode inline suggestion debounce could act as a bottleneck. This also wastes memory by storing unnecessary intermediate completions.
What is the expected correct behavior?
The IDE's streamed response should match the model and language server response.