Skip to content

Support streaming in Chat API

Shinya Maeda requested to merge support-streaming-in-chat-api-v2 into main

What does this merge request do and why?

This MR adds streaming support to the v1/agent/chat endpoint. The streaming is used in the GitLab-Rails counter-part.

This is a high priority MR in order for https://gitlab.com/groups/gitlab-org/-/epics/10585+ and Supporting GitLab Duo (chat) for SM and Dedicated (gitlab-org&11251 - closed).

How to set up and validate locally

  1. Run AI Gateway poetry run ai_gateway

With streaming (with SSE):

shinya@shinya-XPS-15-9530:~$ curl -v -N -X 'POST'   'http://0.0.0.0:5052/v1/chat/agent'   -H 'accept: application/json'   -H 'Content-Type: application/json'   -d '{
  "prompt_components": [
    {
      "type": "string",
      "metadata": {
        "source": "string",
        "version": "string"
      },
      "payload": {
        "content": "\n\nHuman: Can you sing a song?\n\nAssistant:",
        "provider": "anthropic",
        "model": "claude-2.0"
      }
    }
  ],
  "stream": "True"
}'
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 0.0.0.0:5052...
* Connected to 0.0.0.0 (127.0.0.1) port 5052
> POST /v1/chat/agent HTTP/1.1
> Host: 0.0.0.0:5052
> User-Agent: curl/8.4.0
> accept: application/json
> Content-Type: application/json
> Content-Length: 331
> 
< HTTP/1.1 200 OK
< date: Fri, 08 Dec 2023 09:03:15 GMT
< server: uvicorn
< content-type: text/event-stream; charset=utf-8
< x-process-time: 1.3012910810011817
< x-request-id: 55399209587445cab54812d446c3f78c
< transfer-encoding: chunked
< 
 I'm an AI assistant created by Anthropic to be helpful, harmless, and honest. I don't have the ability to sing songs, but I can try to have a pleasant conversation with you.* Connection #0 to host 0.0.0.0 left intact

(Notice that transfer-encoding: chunked)

Without streaming:

shinya@shinya-XPS-15-9530:~$ curl -v -N -X 'POST' \
  'http://0.0.0.0:5052/v1/chat/agent' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt_components": [
    {
      "type": "string",
      "metadata": {
        "source": "string",
        "version": "string"
      },
      "payload": {
        "content": "\n\nHuman: Hi, How are you?\n\nAssistant:",
        "provider": "anthropic",
        "model": "claude-2.0"
      }
    }
  ],
  "stream": "False"
}'
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 0.0.0.0:5052...
* Connected to 0.0.0.0 (127.0.0.1) port 5052
> POST /v1/chat/agent HTTP/1.1
> Host: 0.0.0.0:5052
> User-Agent: curl/8.4.0
> accept: application/json
> Content-Type: application/json
> Content-Length: 328
> 
< HTTP/1.1 200 OK
< date: Wed, 06 Dec 2023 09:28:34 GMT
< server: uvicorn
< content-length: 130
< content-type: application/json
< x-process-time: 0.983616780002194
< x-request-id: fb210b1a10664a44b7e7574a7e0aa12b
< 
{"response":" I'm doing well, thanks for asking!","metadata":{"provider":"anthropic","model":"claude-2.0","timestamp":1701854916}}* Connection #0 to host 0.0.0.0 left intact

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.

Further reading

Ideally, we should use the community libraries like LangChain instead of reinventing the wheel, therefore I opened !475 (closed). However, given the urgency of this feature, we should go for whichever MR that can quickly be approved and merged.

Edited by Shinya Maeda

Merge request reports

Loading