Skip to content

Backfill / update timelogs.spent_at where NULL

Lee Tickett requested to merge 37473-backfill-timelogs-spent-at into master

What does this MR do and why?

In #347473 (closed) we identified that timelogs.spent_at is only populated via quick action, not via the API.

This has now been remedied, but we need to backfill/update all of the empty values.

This MR takes does so by using the created_at timestamp.

I previously ran a background migration on timelogs so used that as a template !60439 (merged)

I'm not sure if this is overkill, perhaps a single UPDATE statement would be sufficient?

First run (cold?):

explain UPDATE timelogs SET spent_at = created_at WHERE spent_at IS NULL

 ModifyTable on public.timelogs  (cost=0.43..143519.97 rows=1217054 width=116) (actual time=177948.671..177948.697 rows=0 loops=1)
   Buffers: shared hit=33539769 read=136929 dirtied=157749 written=31390
   I/O Timings: read=135631.349 write=0.000
   ->  Index Scan using index_timelogs_on_project_id_and_spent_at on public.timelogs  (cost=0.43..143519.97 rows=1217054 width=116) (actual time=9.461..68844.401 rows=1257328 loops=1)
         Index Cond: (timelogs.spent_at IS NULL)
         Buffers: shared hit=672703 read=61163 dirtied=188
         I/O Timings: read=66067.123 write=0.000
Time: 2.966 min
  - planning: 1.251 ms
  - execution: 2.966 min
    - I/O read: 2.261 min
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 33539769 (~255.90 GiB) from the buffer pool
  - reads: 136929 (~1.00 GiB) from the OS file cache, including disk I/O
  - dirtied: 157749 (~1.20 GiB)
  - writes: 31390 (~245.20 MiB)

Subsequent run:
 ModifyTable on public.timelogs  (cost=0.43..72778.38 rows=1 width=116) (actual time=177.247..177.250 rows=0 loops=1)
   Buffers: shared hit=24290
   I/O Timings: read=0.000 write=0.000
   ->  Index Scan using index_timelogs_on_project_id_and_spent_at on public.timelogs  (cost=0.43..72778.38 rows=1 width=116) (actual time=177.243..177.243 rows=0 loops=1)
         Index Cond: (timelogs.spent_at IS NULL)
         Buffers: shared hit=24290
         I/O Timings: read=0.000 write=0.000
Time: 177.804 ms
  - planning: 0.495 ms
  - execution: 177.309 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 24290 (~189.80 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Background Migration Details

1,258,911 rows to update
batch size = 5,000
1,258,911 / 5,000 = 252 batches

Estimated times per batch:
- Records are updated in sub-batches of 100 => 5,000 / 1,000 = 50 total updates
- <4ms for update statement with 100 items [(see linked explain plan)](https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/7650/commands/27184)
Total batch time: <4 * 50 = <200 ms batch

2 mins delay per batch

252 batches * 2 min per batch = 8.4 hours to run all the scheduled jobs
Edited by Lee Tickett

Merge request reports