Reduce Duo Workflow Service checkpoint payload sizes
Problem
Our checkpoints are regularly getting very large (into the megabytes). This creates multiple problems:
- Lots of added latency running flows
- We have a 4MiB limit on payloads that can be proxied by the executor (via gRPC) which means that flows regularly fail
- This adds a lot of storage in Postgres
- This is costing us egress traffic
- Large JSON payloads are likely consuming a lot of CPU on the Duo Workflow Service. Since this is async Python code that is intended to run many parallel flows this might limit our scaling or cause undesirable spikes in latency due to blocking the single threaded async loop
Solution
We should work out what are the essential parts of the checkpoint and try to trim them down. Since our context limit is 1M tokens at most it seems unlikely we'll ever need more than 1MiB of checkpoint data to be kept. Additionally we are seeing always 100KB+ checkpoints even when the flow is doing very little. What is all this data being used for?
We should also see if this work relates to gitlab-org/modelops/applied-ml/code-suggestions/ai-assist#1057 . It's possible we're sending duplicated data every time we save a checkpoint. And similarly every time we fetch checkpoints we're probably receiving duplicates across all the checkpoints.
Edited by Dylan Griffith