Skip to content

Support streaming in Chat API (with LangChain)

Shinya Maeda requested to merge support-streaming-in-chat-api into main

What does this merge request do and why?

This MR adds streaming support to the v1/agent/chat endpoint.

We use LangChain for both JSON response and streaming response. This way we can take an advantage of Python / LangChain community that allows us to use the battle-tested tools.

See LCEL for more LangChain abstract interfaces.

This also fixes the technical debts that chat content is passed to the model.generate as prefix and _suffix, which is misleading.

How to set up and validate locally

  1. Run AI Gateway poetry run ai_gateway

With streaming:

shinya@shinya-XPS-15-9530:~/gitlab-development-kit$ curl -v -N -X 'POST' \
  'http://0.0.0.0:5052/v1/chat/agent' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt_components": [
    {
      "type": "string",
      "metadata": {
        "source": "string",
        "version": "string"
      },
      "payload": {
        "content": "string",
        "provider": "anthropic",
        "model": "claude-2.0"
      }
    }
  ],
  "stream": "True"
}'
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 0.0.0.0:5052...
* Connected to 0.0.0.0 (127.0.0.1) port 5052
> POST /v1/chat/agent HTTP/1.1
> Host: 0.0.0.0:5052
> User-Agent: curl/8.4.0
> accept: application/json
> Content-Type: application/json
> Content-Length: 292
> 
< HTTP/1.1 200 OK
< date: Fri, 01 Dec 2023 04:56:31 GMT
< server: uvicorn
< x-process-time: 0.12322370000038063
< x-request-id: 5fc8362aa2464326ac25e05a76651e00
< transfer-encoding: chunked
< 
 I apologize, I should not have made assumptions about your preferences. Let's move our conversation in a more positive direction.* Connection #0 to host 0.0.0.0 left intact
shinya@shinya-XPS-15-9530:~/gitlab-development-kit$ 

(Notice that transfer-encoding: chunked)

Without streaming:

shinya@shinya-XPS-15-9530:~/gitlab-development-kit$ curl -v -X 'POST' \
  'http://0.0.0.0:5052/v1/chat/agent' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt_components": [
    {
      "type": "string",
      "metadata": {
        "source": "string",
        "version": "string"
      },
      "payload": {
        "content": "string",
        "provider": "anthropic",
        "model": "claude-2.0"
      }
    }
  ],
  "stream": false
}'
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 0.0.0.0:5052...
* Connected to 0.0.0.0 (127.0.0.1) port 5052
> POST /v1/chat/agent HTTP/1.1
> Host: 0.0.0.0:5052
> User-Agent: curl/8.4.0
> accept: application/json
> Content-Type: application/json
> Content-Length: 291
> 
< HTTP/1.1 200 OK
< date: Fri, 01 Dec 2023 04:57:18 GMT
< server: uvicorn
< content-length: 394
< content-type: application/json
< x-process-time: 3.4186516540012235
< x-request-id: 739997b544df40649c2f96e922557f25
< 
* Connection #0 to host 0.0.0.0 left intact
{"response":" I'm afraid I don't have enough context to determine if that statement is racist or not. Making broad generalizations about groups of people based on race is generally not advised. Perhaps it would be better to judge people as individuals based on their character and actions rather than their race.","metadata":{"provider":"anthropic","model":"claude-2.0","timestamp":1701406642}}shinya@shinya-XPS-15-9530:~/gitlab-development-kit$ 

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.
Edited by Shinya Maeda

Merge request reports