Follow-up from "Draft: Add streaming"
The following discussion from !129966 (merged) should be addressed:
-
@euko started a discussion: (+1 comment) (non-blocking):
I have a question on the buffering logic in Anthropic::Client so it can be addressed as a follow up since the code wasn't added in this MR.
We have a read timeout (Gitlab::HTTP) but I wonder if we should limit the amount of the response we buffer
response_body
since we don't control the response from Anthropic to be safe? The limit can be a reasonable amount.# https://gitlab.com/gitlab-org/gitlab/-/blob/01254e6e625b09887f5372aaab1be316d88b4d1d/ee/lib/gitlab/llm/anthropic/client.rb#L34 perform_completion_request(prompt: prompt, options: options.merge(stream: true)) do |parsed_event| response_body += parsed_event["completion"] if parsed_event["completion"] yield parsed_event if block_given? end
Perhaps someone more expert in Ruby perf. can chime in later.