Skip to content

Add worker to prune old export jobs

What does this MR do and why?

There are millions of old ProjectExportJob records on the DB that need pruning.

This change adds a worker that deletes any records older than 7 days.

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/13698/commands/48086

Query

explain SELECT "project_export_jobs"."id" FROM "project_export_jobs" WHERE (updated_at < '2022-11-29 12:44:30.585313') ORDER BY "project_export_jobs"."id" ASC;

Plan with execution

Index Scan using project_export_jobs_pkey on public.project_export_jobs  (cost=0.42..334614.27 rows=49495 width=8) (actual time=17.994..7923.438 rows=49501 loops=1)
   Filter: (project_export_jobs.updated_at < '2022-11-29 12:44:30.585313+00'::timestamp with time zone)
   Rows Removed by Filter: 110357
   Buffers: shared hit=144921 read=7424 dirtied=175
   I/O Timings: read=7445.582 write=0.000

Recommendations

❗️ Query processes too much data to return a relatively small number of rows. – Reduce data cardinality as early as possible during the execution, using one or several of the following techniques: new indexes, partitioning, query rewriting, denormalization. See the visualization of the plan to understand which plan nodes are the main bottlenecks. <http://momjian.us/main/writings/pgsql/hw_performance/|Show details>
❗️ Add LIMIT – The number of rows in the result set is too big. Limit number of rows. <https://postgres.ai/#tip-add-limit|Show details>
❗️ Specialized index needed – The index(es) currently used does not serve quite well for the needs of this query (notice `Rows Removed by Filter: ...`, meaning that the index fetched many non-target rows). Consider adding more specialized index(es). <https://postgres.ai/#tip-index-inefficient-high-filtered|Show details>
Copy

Statistics

Time: 7.929 s
  - planning: 0.719 ms
  - execution: 7.928 s
    - I/O read: 7.446 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 144921 (~1.10 GiB) from the buffer pool
  - reads: 7424 (~58.00 MiB) from the OS file cache, including disk I/O
  - dirtied: 175 (~1.40 MiB)
  - writes: 0

A batched background migration has already been run on production to do an initial prune of ~9m records: https://gitlab.slack.com/archives/CB2S7NNDP/p1670251122239449

ID
336

Job class
PruneStaleProjectExportJobs

Table
project_export_jobs

Status
finished

Progress
100%

Created at
2022-12-02T01:22:32.760Z

Before migration:

[ gprd ] production> ProjectExportJob.last.id
=> 9628218
[ gprd ] production> ProjectExportJob.where("updated_at < ?", 7.days.ago).size
=> 9138998

After migration:

[ gprd ] production> ProjectExportJob.last.id
=> 9800821
[ gprd ] production> ProjectExportJob.where("updated_at < ?", 7.days.ago).size
=> 36302

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #351253 (closed)

Edited by Carla Drago

Merge request reports

Loading