feat: faster code suggestions generation streaming
Description
I was able to improve the speed at which we stream suggestions by at least 2x in for code generation. Resolves #1341 (closed)
Demo
How streaming Currently Works
The flow of the code is as follows:
- When the user triggers inline completion, the
LanguageClientMiddleware.provideInlineCompletionItems
method is called. - If the LSP server responds with a
START_STREAMING_COMMAND
, it starts listening to the incoming stream usingLanguageClientMiddleware.#listenToIncomingStream
. - The
createStreamIterator
function sets up listeners for StreamingCompletionResponse notifications and manages the completion queue. - As completion parts arrive, they are added to the queue, and the
iterator
resolves them when requested by the client. - The
LanguageClientMiddleware
class updates the UI loading state and manages the active streams based on the completion results.
More on the createStreamIterator
function:
- Creates an asynchronous iterator for a completion stream.
- Listens for StreamingCompletionResponse notifications from the LSP server.
- Manages a
queue
of completion parts and resolves them as they arrive. - Sends a CancelStreaming notification to the LSP server when the stream is canceled or detached.
How We Can Improve it
The current implementation is already very solid; however, the queue
array becomes a bottleneck if the rate of the model's response is faster than when the editor can drain the queue of completions. We also store a very large dataset (array of CompletionPart
), which is unnecessary.
By processing the queue
in batches and trimming the array as the size grows, we can let the iterator
"skip" unnecessary completions. This allows the queue
to be processed almost in tandem with the model's response rate.
How has this been tested?
- Tested this locally by building and running the extension with the existing LSP implementation.
- I modified my GDK to simulate fast model responses by forcing the model
claude-3-haiku-20240307
to be used. You can do the same here: https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/code_suggestions/task_factory.rb#L45
Types of changes
-
Bug fix (non-breaking change which fixes an issue) -
New feature (non-breaking change which adds functionality) -
Breaking change (fix or feature that would cause existing functionality to change) -
Documentation -
Chore (Related to CI or Packaging to platforms) -
Test gap
Edited by Angelo Rivera