Skip to content

Optimize StuckCiJobsWorker running builds query

What does this MR do?

In !62239 (merged) we optimized queries related to selecting pending stuck builds. This applies the same optimization to running builds selected by the StuckCiJobsWorker. Running builds were not initially included in the last MR because the initial solution generated a different and inefficient query plan when the build status was 'running'.

Related Issues: #334399 (closed), #291010 (closed)

I found that the running query is also causing the most frequent timeouts(with 634 events):

Screen_Shot_2021-06-04_at_2.01.35_PM

Database

Existing Query (Existing indexes)

explain SELECT "ci_builds".*  
FROM "ci_builds"  
WHERE "ci_builds"."type" = 'Ci::Build'  
AND ("ci_builds"."status" IN ('running'))  
AND ci_builds.updated_at < '2021-06-23 14:07:20.385873' limit 100;
Limit  (cost=0.42..145.19 rows=100 width=1282)
  ->  Index Scan using index_ci_builds_runner_id_running on ci_builds  (cost=0.42..429826.47 rows=296912 width=1282)
        Filter: (updated_at < '2021-06-23 14:07:20.385873'::timestamp without time zone)
Time: 9.658 min
  - planning: 24.088 ms
  - execution: 9.658 min
    - I/O read: 9.489 min
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 129456 (~1011.40 MiB) from the buffer pool
  - reads: 433393 (~3.30 GiB) from the OS file cache, including disk I/O
  - dirtied: 159093 (~1.20 GiB)
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/4766/commands/16979

New Query

explain SELECT "ci_builds".* 
FROM "ci_builds" 
WHERE "ci_builds"."type" = 'Ci::Build' 
AND ("ci_builds"."status" IN ('running')) 
AND ((ci_builds.created_at BETWEEN '2021-05-30 15:07:20.385754' AND '2021-06-04 14:07:20.385873') 
AND (ci_builds.updated_at BETWEEN '2021-05-30 15:07:20.385754' AND '2021-06-04 14:07:20.385873'))
limit 100;

Explain Plan

 Limit  (cost=0.58..3560.61 rows=26 width=1281) (actual time=0.078..4.541 rows=100 loops=1)
   Buffers: shared hit=1070
   I/O Timings: read=0.000 write=0.000
   ->  Index Scan using ci_builds_gitlab_monitor_metrics on public.ci_builds  (cost=0.58..3560.61 rows=26 width=1281) (actual time=0.076..4.519 rows=100 loops=1)
         Index Cond: (((ci_builds.status)::text = 'running'::text) AND (ci_builds.created_at >= '2021-05-30 15:07:20.385754'::timestamp without time zone) AND (ci_builds.created_at <= '2021-06-04 14:07:20.385873'::timestamp without time zone))
         Filter: ((ci_builds.updated_at >= '2021-05-30 15:07:20.385754'::timestamp without time zone) AND (ci_builds.updated_at <= '2021-06-04 14:07:20.385873'::timestamp without time zone))
         Rows Removed by Filter: 0
         Buffers: shared hit=1070
         I/O Timings: read=0.000 write=0.000

Cold Cache Perf

Time: 735.886 ms
  - planning: 24.521 ms
  - execution: 711.365 ms
    - I/O read: 633.322 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 1198 (~9.40 MiB) from the buffer pool
  - reads: 13338 (~104.20 MiB) from the OS file cache, including disk I/O
  - dirtied: 2540 (~19.80 MiB)
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/4460/commands/15565

Warm Cache Perf

Time: 5.155 ms
  - planning: 0.568 ms
  - execution: 4.587 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 1070 (~8.40 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/4460/commands/15566

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Edited by Allison Browne

Merge request reports