Streaming: Fix LS exposing internal details (intent detection)

Problem

When we implemented streaming, we used design that was faster to implement and it will be harder to maintain. The design decision is documented on the epic in a status report &11722 (comment 1698672696)

TL;DR: for every suggestion request we make two requests to the LS, one to get intent and the other to get suggestion/generation.

Solution

The fix is to hide the intent in LS. - Option B

This issue affects both LS and VS Code Extension

Details

The proposed solution will extend the standard inlineCompletion response with information that tells the client a stream is coming. The best way to attach this "stream metadata" is to use the command parameter of the InlineCompletionItem https://microsoft.github.io/language-server-protocol/specifications/lsp/3.18/specification/#textDocument_inlineCompletion

The command can have a string name and arbitrary type arguments.

Currently, we use the command parameter for "suggestion accepted" command. This issue would introduce another command, something like gitlab.ls.streamAttached with arguments of streamId and trackingId (mind you now the client generates the streamId but after this change the server generates one).

The client would then decide whether to simply show the InlineCompletionItem (in a case that there is no stream) or it should start streaming.

Other considerations

there should be a LS configuration that indicates whether the client can handle streaming or not, if not, the LS should use the current method (or alternatively consume the full stream? [this should be discussed with the backend AI Framework team])

Original decision description

Option A: 2 calls to LS

Use @shekharpatnaik's approach of keeping the streaming completely separate. For each completion request from IDE, we'll need first to ask LS if we should do completion or generation and then call the appropriate endpoint LS.

sequenceDiagram
    VS Code Extension->>+LS: should I do completion or generation
    LS->>-VS Code Extension: generation
 VS Code Extension->>+LS: get me stream for position XYZ
    LS->>-VS Code Extension: stream

Pros:
- the completely separate code means the lowest risk to the existing suggestion code
- fast delivery because we ignore existing code
Cons:
- we'll have to duplicate logic for debouncing, telemetry, cancellation and other features
- we expose implementation details to the client (intent)
- we increase the maintenance cost of the feature
- more complex implementation for all clients

Option B: 1 call to LS

We extend the LS protocol for inline completion with the possibility of a follow-up stream. The LS decides whether it should do completion or generation; for generation, it will start streaming to the client.

sequenceDiagram
    VS Code Extension->>+LS: give me inline competion for position XYZ
    LS->>-VS Code Extension: completion or stream based on intent

Pros:
- unified logic, LS fully controls the feature
- reuse of existing debouncing, cancellation and possibly parts of the telemetry (once we know what telemetry looks like)
- less maintenance cost
- simpler implementation for all clients
Cons:
- It takes longer at the beginning because we have to think about what parts of completion and generation should be shared.

Edited Jan 03, 2024 by Tomas Vik