Normalise checkpoints table with checkpoint_blobs
Background
This is a follow up to gitlab-org/duo-workflow/duo-workflow-service#100 (closed)
While exploring official postgreSQL checkpointer implementation for LangGraph in v0.2, it was discovered that it creates one additional relation in a database: checkpoint_blobs. LangChain engineers upon follow up question onto purpose of checkpoint_blobs responded:
checkpoint blobs table stores the actual contents of each state key, at each version. This is an optimization, which enables saving only the state keys that changed at each step. When there is a checkpoint blobs table the checkpoints table is used to store only metadata. This optimization is optional, but id recommend applying it
which was followed up with confirmation
checkpoint blobs is a way to normalise data, by extracting channels values into separate table with 1 to N relationship with checkpoints. With that in mind new_versions: ChannelVersions parameter from put method in BaseCheckpointSaver interface informs which channels versions were updated, that helps select correct key, value pairs form channel_values key in checkpoint
Goal
Consider normalisation of checkpoints relation in order to reduce size of stored data by creating checkpoint_blobs in similar manner how it was created in official postgreSQL driver
Implementation
- Create new
checkpoint_blobsrelation in GitLab postgreSQL db (official table schema for reference) - Adjust GitLab Rails checkpoints API to handle blobs and checkpoints correctly
- Update
aputmethod fromgitlab_saver.pyfile in Duo Workflow Service to usenew_versionsparameter to correctly extract modified channels fromcheckpointand save them intocheckpoint_blobs - Adjust read GitLab checkpoints API to use
joinquery to extract data from all chekcpoints tables, in similar manner as official postrgres checkpointer does