Replace each_batch PruneExpiredExportJobsService with scalable queries
What does this MR do and why?
This MR removes the plain .each_batch
queries added in Prune old RelationExport and RelationExportUplo... (!142246 - merged) after a database review noted that it wouldn't scale well for high volumes of data. Instead, a new index on project_export_uploads
for updated_at
and id
was added to efficiently delete batches of ProjectExportJobs in a loop. Then, keyset-based loop was implemented to delete Uploads associated with the expired export jobs.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
How to set up and validate locally
Same as Prune old RelationExport and RelationExportUplo... (!142246 - merged). No actual functionality was changed.
Queries
deleted_count = ProjectExportJob.prunable.limit(1000).delete_all
Raw SQL:
DELETE FROM project_export_jobs WHERE project_export_jobs.id IN (SELECT project_export_jobs.id FROM project_export_jobs WHERE (updated_at < '2024-02-19 17:15:14.211858') LIMIT 1000)
prunable_uploads.delete_all
Raw SQL without keyset iteration:
SELECT "uploads".* FROM "uploads" WHERE "uploads"."model_type" = 'Projects::ImportExport::RelationExportUpload' AND "uploads"."model_id" IN (SELECT "project_relation_export_uploads"."id" FROM "project_relation_export_uploads" INNER JOIN "project_relation_exports" "relation_export" ON "relation_export"."id" = "project_relation_export_uploads"."project_relation_export_id" WHERE "relation_export"."project_export_job_id" IN (SELECT "project_export_jobs"."id" FROM "project_export_jobs" WHERE (updated_at < '2024-02-19 17:15:14.211858')));
Query execution plan. This plan should be the same as the MR that introduced it, however, this time it uses keyset pagination instead of EachBatch
.
Related to #441698 (closed)