Duo Workflow memory growth for checkpoints in outbox in long running workflows
Problem
We have some logic at https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/7efaaf208afd0f00df33fb268e82d78469cca589/duo_workflow_service/executor/outbox.py#L40-L41 which is keeping track of every outgoing message requestID in memory so that when a response comes back we can resolve the future (async way of responding to a function call). But the problem is for outgoing newCheckpoint messages that come from the checkpoint notifier we never actually get a response. This is by design as we don't want to waste bandwidth on clients responding to streaming updates.
But the problem is that the longer a workflow runs for the more of these outgoing checkpoints accumulate in memory in the _action_response and _legacy_action_response dictionaries. This is likely very minimal overhead for short workflows but workflows that run for several minutes or longer might send out 10s of thousands of newCheckpoint messages. This memory overhead may eventually add up cause memory pressure on the python server.
Solution
We shouldn't be putting the newCheckpoint type messages into those dictionaries at all. They never get a response. We can either introduce a new function like put_action_without_response (which does not update the dictionaries) or check the message type and ignore the newCheckpoint types.