Support streaming in Chat API (with LangChain) (!475) · Merge requests · GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

What does this merge request do and why?

This MR adds streaming support to the v1/agent/chat endpoint.

We use LangChain for both JSON response and streaming response. This way we can take an advantage of Python / LangChain community that allows us to use the battle-tested tools.

See LCEL for more LangChain abstract interfaces.

This also fixes the technical debts that chat content is passed to the model.generate as prefix and _suffix, which is misleading.

GitLab-Rails counter-part: Ai Gateway client for Duo Chat (gitlab-org/gitlab!138274 - merged)
Related Build a client for AI Gateway to connect duo chat (gitlab-org/gitlab#431563 - closed)

How to set up and validate locally

Run AI Gateway poetry run ai_gateway

With streaming:

shinya@shinya-XPS-15-9530:~/gitlab-development-kit$ curl -v -N -X 'POST' \
  'http://0.0.0.0:5052/v1/chat/agent' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt_components": [
    {
      "type": "string",
      "metadata": {
        "source": "string",
        "version": "string"
      },
      "payload": {
        "content": "string",
        "provider": "anthropic",
        "model": "claude-2.0"
      }
    }
  ],
  "stream": "True"
}'
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 0.0.0.0:5052...
* Connected to 0.0.0.0 (127.0.0.1) port 5052
> POST /v1/chat/agent HTTP/1.1
> Host: 0.0.0.0:5052
> User-Agent: curl/8.4.0
> accept: application/json
> Content-Type: application/json
> Content-Length: 292
> 
< HTTP/1.1 200 OK
< date: Fri, 01 Dec 2023 04:56:31 GMT
< server: uvicorn
< x-process-time: 0.12322370000038063
< x-request-id: 5fc8362aa2464326ac25e05a76651e00
< transfer-encoding: chunked
< 
 I apologize, I should not have made assumptions about your preferences. Let's move our conversation in a more positive direction.* Connection #0 to host 0.0.0.0 left intact
shinya@shinya-XPS-15-9530:~/gitlab-development-kit$

(Notice that transfer-encoding: chunked)

Without streaming:

shinya@shinya-XPS-15-9530:~/gitlab-development-kit$ curl -v -X 'POST' \
  'http://0.0.0.0:5052/v1/chat/agent' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt_components": [
    {
      "type": "string",
      "metadata": {
        "source": "string",
        "version": "string"
      },
      "payload": {
        "content": "string",
        "provider": "anthropic",
        "model": "claude-2.0"
      }
    }
  ],
  "stream": false
}'
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 0.0.0.0:5052...
* Connected to 0.0.0.0 (127.0.0.1) port 5052
> POST /v1/chat/agent HTTP/1.1
> Host: 0.0.0.0:5052
> User-Agent: curl/8.4.0
> accept: application/json
> Content-Type: application/json
> Content-Length: 291
> 
< HTTP/1.1 200 OK
< date: Fri, 01 Dec 2023 04:57:18 GMT
< server: uvicorn
< content-length: 394
< content-type: application/json
< x-process-time: 3.4186516540012235
< x-request-id: 739997b544df40649c2f96e922557f25
< 
* Connection #0 to host 0.0.0.0 left intact
{"response":" I'm afraid I don't have enough context to determine if that statement is racist or not. Making broad generalizations about groups of people based on race is generally not advised. Perhaps it would be better to judge people as individuals based on their character and actions rather than their race.","metadata":{"provider":"anthropic","model":"claude-2.0","timestamp":1701406642}}shinya@shinya-XPS-15-9530:~/gitlab-development-kit$

Merge request checklist

Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.

Edited Dec 01, 2023 by Shinya Maeda

Support streaming in Chat API (with LangChain)

What does this merge request do and why?

How to set up and validate locally

Merge request checklist

Merge request reports