Pass http execution error to DWS and improve logs
What does this MR do and why?
Improved logging
We're investigating connection drops in Workhorse when running DAP flows. We would like to improve these logs because
- If
serveHTTPSafefails,Sending HTTP response eventwouldn't be printed, and we won't be able to see any logs for that request id. - Errors in
nullResponseWriterare not having a correlation id, making debugging difficult.
Passing the error to DWS
- If HTTP request execution fails, we would stop execution and drop the connection. Instead, now we send this error back to DWS so that LLM can act.
This MR improves logging and passes the error back to DWS.
References
Related to gitlab-org/modelops/applied-ml/code-suggestions/ai-assist#1789
Screenshots or screen recordings
| Before | After |
|---|---|
How to set up and validate locally
- Apply this diff temporarily to simulate a large response from GraphQL request
--- a/ee/app/presenters/ai/duo_workflows/workflow_checkpoint_event_presenter.rb
+++ b/ee/app/presenters/ai/duo_workflows/workflow_checkpoint_event_presenter.rb
@@ -8,6 +8,11 @@ class WorkflowCheckpointEventPresenter < Gitlab::View::Presenter::Delegated
DuoMessage = Struct.new(:content, :message_type, :status, :tool_info,
:timestamp, :correlation_id, :role, keyword_init: true)
+ # TODO: Remove this - temporary override to reproduce large response bug
+ def metadata
+ { "dummy_payload" => "x" * (5 * 1024 * 1024) }.to_json
+ end
+
gdk restart rails-webcd workhorse && make gitlab-workhorse && gdk restart workhorse- Run a chat flow. It should finish normally. Send one more message and check workhorse logs
- You'll see logs like
nullResponseWriter: partial write due to size limitwith a correlation id and request id. - You'll not see anymore
Cancellederror in DWS. You can see Internal error since a flow cannot be resumed without the initial graphql call.
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Edited by Halil Coban