Support streaming in Chat API (!484) · Merge requests · GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

What does this merge request do and why?

This MR adds streaming support to the v1/agent/chat endpoint. The streaming is used in the GitLab-Rails counter-part.

This is a high priority MR in order for https://gitlab.com/groups/gitlab-org/-/epics/10585+ and Supporting GitLab Duo (chat) for SM and Dedicated (gitlab-org&11251 - closed).

Related Build a client for AI Gateway to connect duo chat (gitlab-org/gitlab#431563 - closed)

How to set up and validate locally

Run AI Gateway poetry run ai_gateway

With streaming (with SSE):

shinya@shinya-XPS-15-9530:~$ curl -v -N -X 'POST'   'http://0.0.0.0:5052/v1/chat/agent'   -H 'accept: application/json'   -H 'Content-Type: application/json'   -d '{
  "prompt_components": [
    {
      "type": "string",
      "metadata": {
        "source": "string",
        "version": "string"
      },
      "payload": {
        "content": "\n\nHuman: Can you sing a song?\n\nAssistant:",
        "provider": "anthropic",
        "model": "claude-2.0"
      }
    }
  ],
  "stream": "True"
}'
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 0.0.0.0:5052...
* Connected to 0.0.0.0 (127.0.0.1) port 5052
> POST /v1/chat/agent HTTP/1.1
> Host: 0.0.0.0:5052
> User-Agent: curl/8.4.0
> accept: application/json
> Content-Type: application/json
> Content-Length: 331
> 
< HTTP/1.1 200 OK
< date: Fri, 08 Dec 2023 09:03:15 GMT
< server: uvicorn
< content-type: text/event-stream; charset=utf-8
< x-process-time: 1.3012910810011817
< x-request-id: 55399209587445cab54812d446c3f78c
< transfer-encoding: chunked
< 
 I'm an AI assistant created by Anthropic to be helpful, harmless, and honest. I don't have the ability to sing songs, but I can try to have a pleasant conversation with you.* Connection #0 to host 0.0.0.0 left intact

(Notice that transfer-encoding: chunked)

Without streaming:

shinya@shinya-XPS-15-9530:~$ curl -v -N -X 'POST' \
  'http://0.0.0.0:5052/v1/chat/agent' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt_components": [
    {
      "type": "string",
      "metadata": {
        "source": "string",
        "version": "string"
      },
      "payload": {
        "content": "\n\nHuman: Hi, How are you?\n\nAssistant:",
        "provider": "anthropic",
        "model": "claude-2.0"
      }
    }
  ],
  "stream": "False"
}'
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 0.0.0.0:5052...
* Connected to 0.0.0.0 (127.0.0.1) port 5052
> POST /v1/chat/agent HTTP/1.1
> Host: 0.0.0.0:5052
> User-Agent: curl/8.4.0
> accept: application/json
> Content-Type: application/json
> Content-Length: 328
> 
< HTTP/1.1 200 OK
< date: Wed, 06 Dec 2023 09:28:34 GMT
< server: uvicorn
< content-length: 130
< content-type: application/json
< x-process-time: 0.983616780002194
< x-request-id: fb210b1a10664a44b7e7574a7e0aa12b
< 
{"response":" I'm doing well, thanks for asking!","metadata":{"provider":"anthropic","model":"claude-2.0","timestamp":1701854916}}* Connection #0 to host 0.0.0.0 left intact

Merge request checklist

Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.

Support streaming in Chat API

What does this merge request do and why?

How to set up and validate locally

Merge request checklist

Further reading

Merge request reports