Skip to content

Avoid loading unneeded columns during delete

Cal Pratt requested to merge cpratt34/bulk-update-records into master

Add more detailed logging to record cleanup, and avoid loading unneeded columns during delete.

Before this change, the select statements for marking rows deleted would also include the inlined bytes. Depending on the configuration, this can result in a lot of data. Using session.query(IndexEntry).options(load_only("digest_hash", "digest_size_bytes")) we can avoid this unneeded data transfer. This should result in a significant speedup when there are many inlined bytes returned.

The resulting queries made in the backend now look like the following (which exclude the inlined bytes):

SELECT index.digest_hash AS index_digest_hash, index.digest_size_bytes AS index_digest_size_bytes 
FROM index 
WHERE index.deleted = false 
AND index.accessed_timestamp < %(accessed_timestamp_1)s 
AND index.accessed_timestamp >= %(accessed_timestamp_2)s 
ORDER BY index.accessed_timestamp ASC 
FOR UPDATE SKIP LOCKED
Edited by Cal Pratt

Merge request reports