Skip to content

Normalise checkpoints table with checkpoint_blobs

Background

This is a follow up to gitlab-org/duo-workflow/duo-workflow-service#100 (closed)

While exploring official postgreSQL checkpointer implementation for LangGraph in v0.2, it was discovered that it creates one additional relation in a database: checkpoint_blobs. LangChain engineers upon follow up question onto purpose of checkpoint_blobs responded:

checkpoint blobs table stores the actual contents of each state key, at each version. This is an optimization, which enables saving only the state keys that changed at each step. When there is a checkpoint blobs table the checkpoints table is used to store only metadata. This optimization is optional, but id recommend applying it

which was followed up with confirmation

checkpoint blobs is a way to normalise data, by extracting channels values into separate table with 1 to N relationship with checkpoints. With that in mind new_versions: ChannelVersions parameter from put method in BaseCheckpointSaver interface informs which channels versions were updated, that helps select correct key, value pairs form channel_values key in checkpoint

Goal

Consider normalisation of checkpoints relation in order to reduce size of stored data by creating checkpoint_blobs in similar manner how it was created in official postgreSQL driver

Implementation

  1. Create new checkpoint_blobs relation in GitLab postgreSQL db (official table schema for reference)
  2. Adjust GitLab Rails checkpoints API to handle blobs and checkpoints correctly
  3. Update aput method from gitlab_saver.py file in Duo Workflow Service to use new_versions parameter to correctly extract modified channels from checkpoint and save them into checkpoint_blobs
  4. Adjust read GitLab checkpoints API to use join query to extract data from all chekcpoints tables, in similar manner as official postrgres checkpointer does