Reduce Duo Workflow Service checkpoint payload sizes

Problem

Our checkpoints are regularly getting very large (into the megabytes). This creates multiple problems:

Lots of added latency running flows
We have a 4MiB limit on payloads that can be proxied by the executor (via gRPC) which means that flows regularly fail
This adds a lot of storage in Postgres
This is costing us egress traffic
Large JSON payloads are likely consuming a lot of CPU on the Duo Workflow Service. Since this is async Python code that is intended to run many parallel flows this might limit our scaling or cause undesirable spikes in latency due to blocking the single threaded async loop

Solution

We should work out what are the essential parts of the checkpoint and try to trim them down. Since our context limit is 1M tokens at most it seems unlikely we'll ever need more than 1MiB of checkpoint data to be kept. Additionally we are seeing always 100KB+ checkpoints even when the flow is doing very little. What is all this data being used for?

We should also see if this work relates to gitlab-org/modelops/applied-ml/code-suggestions/ai-assist#1057 . It's possible we're sending duplicated data every time we save a checkpoint. And similarly every time we fetch checkpoints we're probably receiving duplicates across all the checkpoints.

Reduce Duo Workflow Service checkpoint payload sizes

Problem

Solution

Tasks

1. Compress checkpoint payloads

2. Split UI chatlog in database

3. Split UI chatlog in graphql JSON

4. Investigate grpc compression