Skip to content

Support streaming in Chat API

Shinya Maeda requested to merge support-streaming-in-chat-api-v2 into main

What does this merge request do and why?

This MR adds streaming support to the v1/agent/chat endpoint. The streaming is used in the GitLab-Rails counter-part.

This is a high priority MR in order for https://gitlab.com/groups/gitlab-org/-/epics/10585+ and Supporting GitLab Duo (chat) for SM and Dedicated (gitlab-org&11251 - closed).

How to set up and validate locally

  1. Run AI Gateway poetry run ai_gateway

With streaming (with SSE):

shinya@shinya-XPS-15-9530:~$ curl -v -N -X 'POST'   'http://0.0.0.0:5052/v1/chat/agent'   -H 'accept: application/json'   -H 'Content-Type: application/json'   -d '{
  "prompt_components": [
    {
      "type": "string",
      "metadata": {
        "source": "string",
        "version": "string"
      },
      "payload": {
        "content": "\n\nHuman: Can you sing a song?\n\nAssistant:",
        "provider": "anthropic",
        "model": "claude-2.0"
      }
    }
  ],
  "stream": "True"
}'
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 0.0.0.0:5052...
* Connected to 0.0.0.0 (127.0.0.1) port 5052
> POST /v1/chat/agent HTTP/1.1
> Host: 0.0.0.0:5052
> User-Agent: curl/8.4.0
> accept: application/json
> Content-Type: application/json
> Content-Length: 331
> 
< HTTP/1.1 200 OK
< date: Fri, 08 Dec 2023 09:03:15 GMT
< server: uvicorn
< content-type: text/event-stream; charset=utf-8
< x-process-time: 1.3012910810011817
< x-request-id: 55399209587445cab54812d446c3f78c
< transfer-encoding: chunked
< 
 I'm an AI assistant created by Anthropic to be helpful, harmless, and honest. I don't have the ability to sing songs, but I can try to have a pleasant conversation with you.* Connection #0 to host 0.0.0.0 left intact

(Notice that transfer-encoding: chunked)

Without streaming:

shinya@shinya-XPS-15-9530:~$ curl -v -N -X 'POST' \
  'http://0.0.0.0:5052/v1/chat/agent' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt_components": [
    {
      "type": "string",
      "metadata": {
        "source": "string",
        "version": "string"
      },
      "payload": {
        "content": "\n\nHuman: Hi, How are you?\n\nAssistant:",
        "provider": "anthropic",
        "model": "claude-2.0"
      }
    }
  ],
  "stream": "False"
}'
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 0.0.0.0:5052...
* Connected to 0.0.0.0 (127.0.0.1) port 5052
> POST /v1/chat/agent HTTP/1.1
> Host: 0.0.0.0:5052
> User-Agent: curl/8.4.0
> accept: application/json
> Content-Type: application/json
> Content-Length: 328
> 
< HTTP/1.1 200 OK
< date: Wed, 06 Dec 2023 09:28:34 GMT
< server: uvicorn
< content-length: 130
< content-type: application/json
< x-process-time: 0.983616780002194
< x-request-id: fb210b1a10664a44b7e7574a7e0aa12b
< 
{"response":" I'm doing well, thanks for asking!","metadata":{"provider":"anthropic","model":"claude-2.0","timestamp":1701854916}}* Connection #0 to host 0.0.0.0 left intact

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.

Further reading

Ideally, we should use the community libraries like LangChain instead of reinventing the wheel, therefore I opened !475 (closed). However, given the urgency of this feature, we should go for whichever MR that can quickly be approved and merged.

Edited by Shinya Maeda

Merge request reports