Support streaming in Chat API
What does this merge request do and why?
This MR adds streaming support to the v1/agent/chat
endpoint. The streaming is used in the GitLab-Rails counter-part.
This is a high priority MR in order for https://gitlab.com/groups/gitlab-org/-/epics/10585+ and Supporting GitLab Duo (chat) for SM and Dedicated (gitlab-org&11251 - closed).
How to set up and validate locally
- Run AI Gateway
poetry run ai_gateway
With streaming (with SSE):
shinya@shinya-XPS-15-9530:~$ curl -v -N -X 'POST' 'http://0.0.0.0:5052/v1/chat/agent' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
"prompt_components": [
{
"type": "string",
"metadata": {
"source": "string",
"version": "string"
},
"payload": {
"content": "\n\nHuman: Can you sing a song?\n\nAssistant:",
"provider": "anthropic",
"model": "claude-2.0"
}
}
],
"stream": "True"
}'
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying 0.0.0.0:5052...
* Connected to 0.0.0.0 (127.0.0.1) port 5052
> POST /v1/chat/agent HTTP/1.1
> Host: 0.0.0.0:5052
> User-Agent: curl/8.4.0
> accept: application/json
> Content-Type: application/json
> Content-Length: 331
>
< HTTP/1.1 200 OK
< date: Fri, 08 Dec 2023 09:03:15 GMT
< server: uvicorn
< content-type: text/event-stream; charset=utf-8
< x-process-time: 1.3012910810011817
< x-request-id: 55399209587445cab54812d446c3f78c
< transfer-encoding: chunked
<
I'm an AI assistant created by Anthropic to be helpful, harmless, and honest. I don't have the ability to sing songs, but I can try to have a pleasant conversation with you.* Connection #0 to host 0.0.0.0 left intact
(Notice that transfer-encoding: chunked
)
Without streaming:
shinya@shinya-XPS-15-9530:~$ curl -v -N -X 'POST' \
'http://0.0.0.0:5052/v1/chat/agent' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"prompt_components": [
{
"type": "string",
"metadata": {
"source": "string",
"version": "string"
},
"payload": {
"content": "\n\nHuman: Hi, How are you?\n\nAssistant:",
"provider": "anthropic",
"model": "claude-2.0"
}
}
],
"stream": "False"
}'
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying 0.0.0.0:5052...
* Connected to 0.0.0.0 (127.0.0.1) port 5052
> POST /v1/chat/agent HTTP/1.1
> Host: 0.0.0.0:5052
> User-Agent: curl/8.4.0
> accept: application/json
> Content-Type: application/json
> Content-Length: 328
>
< HTTP/1.1 200 OK
< date: Wed, 06 Dec 2023 09:28:34 GMT
< server: uvicorn
< content-length: 130
< content-type: application/json
< x-process-time: 0.983616780002194
< x-request-id: fb210b1a10664a44b7e7574a7e0aa12b
<
{"response":" I'm doing well, thanks for asking!","metadata":{"provider":"anthropic","model":"claude-2.0","timestamp":1701854916}}* Connection #0 to host 0.0.0.0 left intact
Merge request checklist
-
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.
Further reading
Ideally, we should use the community libraries like LangChain instead of reinventing the wheel, therefore I opened !475 (closed). However, given the urgency of this feature, we should go for whichever MR that can quickly be approved and merged.
Edited by Shinya Maeda