Fix partition missing error in project_daily_statistics backfill

What does this MR do and why?

Related: PostgreSQL error when upgrading from 18.5.3 -> ... (#582526 - closed) BackfillPartitionedProjectDailyStatistics error... (#582512 - closed)

This MR fixes partition errors when Self-managed/Dedicated customers upgrade to GitLab 18.5

Problem:

  • Migration has hardcoded START_DATE = Date.new(2025, 8, 1)
  • Model has retain_for: 3.months
  • For GitLab.com (upgraded Aug): August within 3 months, works
  • For self-managed (upgrading Dec): August 4 months old, partition deleted by partition manager
  • Backfill tries to copy August data -> ERROR: no partition found

Root cause: Hardcoded START_DATE assumes controlled rollout timing, breaks for self-managed deployments upgrading months later.

Solution (two-part fix):

  1. Dynamic START_DATE: Use 3.months.ago.beginning_of_month instead of hardcoded date to respect retention policy at migration time

  2. Filter in background job: Skip records outside retention window during backfill to handle already-enqueued jobs on affected instances

Query Plan

The link to postres.ai

After the swap those table names have been changed. So, for the query plan I used project_daily_statistics_archived

`project_daily_statistics` --> `project_daily_statistics_archived`,

`project_daily_statistics_b8088ecbd2` --> `project_daily_statistics`
Click to expand: query plan
INSERT INTO project_daily_statistics (id, project_id, fetch_count, date)
  SELECT id, project_id, fetch_count, date
  FROM project_daily_statistics_archived
  WHERE project_daily_statistics_archived.id IN (
      SELECT project_daily_statistics_archived.id
      FROM project_daily_statistics_archived
      WHERE project_daily_statistics_archived.id BETWEEN 4343208907 AND 4343210907
      LIMIT 1000
    )
    AND project_daily_statistics_archived.date >= '2025-09-01'
    LIMIT 1000
  FOR UPDATE
  ON CONFLICT (id, date) DO NOTHING
 ModifyTable on public.project_daily_statistics  (cost=50.98..3659.43 rows=0 width=0) (actual time=11.560..11.564 rows=0 loops=1)
   Buffers: shared hit=12291
   WAL: records=1000 fpi=0 bytes=54000
   I/O Timings: read=0.000 write=0.000
   ->  Subquery Scan on *SELECT*  (cost=50.98..3659.43 rows=201 width=24) (actual time=1.520..7.066 rows=1000 loops=1)
         Buffers: shared hit=7176
         WAL: records=1000 fpi=0 bytes=54000
         I/O Timings: read=0.000 write=0.000
         ->  Limit  (cost=50.98..3657.42 rows=201 width=58) (actual time=1.519..6.883 rows=1000 loops=1)
               Buffers: shared hit=7176
               WAL: records=1000 fpi=0 bytes=54000
               I/O Timings: read=0.000 write=0.000
               ->  LockRows  (cost=50.98..3657.42 rows=201 width=58) (actual time=1.518..6.754 rows=1000 loops=1)
                     Buffers: shared hit=7176
                     WAL: records=1000 fpi=0 bytes=54000
                     I/O Timings: read=0.000 write=0.000
                     ->  Nested Loop  (cost=50.98..3655.41 rows=201 width=58) (actual time=1.489..5.472 rows=1000 loops=1)
                           Buffers: shared hit=5313
                           I/O Timings: read=0.000 write=0.000
                           ->  HashAggregate  (cost=50.41..60.41 rows=1000 width=40) (actual time=1.463..1.762 rows=1000 loops=1)
                                 Group Key: "ANY_subquery".id
                                 Buffers: shared hit=313
                                 I/O Timings: read=0.000 write=0.000
                                 ->  Subquery Scan on ANY_subquery  (cost=0.57..47.91 rows=1000 width=40) (actual time=0.033..1.162 rows=1000 loops=1)
                                       Buffers: shared hit=313
                                       I/O Timings: read=0.000 write=0.000
                                       ->  Limit  (cost=0.57..37.91 rows=1000 width=8) (actual time=0.023..0.943 rows=1000 loops=1)
                                             Buffers: shared hit=313
                                             I/O Timings: read=0.000 write=0.000
                                             ->  Index Only Scan using project_daily_statistics_archived_pkey on public.project_daily_statistics_archived project_daily_statistics_archived_1  (cost=0.57..71.28 rows=1894 width=8) (actual time=0.023..0.854 rows=1000 loops=1)
                                                   Index Cond: ((project_daily_statistics_archived_1.id >= '4343208907'::bigint) AND (project_daily_statistics_archived_1.id <= '4343210907'::bigint))
                                                   Heap Fetches: 137
                                                   Buffers: shared hit=313
                                                   I/O Timings: read=0.000 write=0.000
                           ->  Index Scan using project_daily_statistics_archived_pkey on public.project_daily_statistics_archived  (cost=0.57..3.60 rows=1 width=26) (actual time=0.003..0.003 rows=1 loops=1000)
                                 Index Cond: (project_daily_statistics_archived.id = "ANY_subquery".id)
                                 Filter: (project_daily_statistics_archived.date >= '2025-09-01'::date)
                                 Rows Removed by Filter: 0
                                 Buffers: shared hit=5000
                                 I/O Timings: read=0.000 write=0.000
Settings: random_page_cost = '1.5', seq_page_cost = '4', work_mem = '100MB', effective_cache_size = '472585MB', jit = 'off'
Time: 13.934 ms
  - planning: 2.021 ms
  - execution: 11.913 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

filter_sub_batch_content method scans by ID first and then apply date as we expected. 0 records were written because all those records already exist in the project_daily_statistics table.

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Emma Park

Merge request reports

Loading