Add batching of streamed chat response (!151054) · Merge requests · GitLab.org / GitLab

Nicolas Dular requested to merge nd/batch-streamed-chat-response into master Apr 26, 2024

What does this MR do and why?

Add batching of streamed chat response

This adds a batched response for streamed chat responses by only sending every second token as a batched response.

As we have a significant overhead for each token via our GraphQL subscription JSON response, this should at least reduce the amount of websocket messages we need to send for each chat response.

Changelog: changed EE: true

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before	After

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Add batching of streamed chat response

What does this MR do and why?

MR acceptance checklist

Screenshots or screen recordings

How to set up and validate locally

Merge request reports