Optimize StuckCiJobsWorker running builds query
What does this MR do?
In !62239 (merged) we optimized queries related to selecting pending stuck builds. This applies the same optimization to running builds selected by the StuckCiJobsWorker. Running builds were not initially included in the last MR because the initial solution generated a different and inefficient query plan when the build status was 'running'.
Related Issues: #334399 (closed), #291010 (closed)
I found that the running query is also causing the most frequent timeouts(with 634 events):
Database
Existing Query (Existing indexes)
explain SELECT "ci_builds".*
FROM "ci_builds"
WHERE "ci_builds"."type" = 'Ci::Build'
AND ("ci_builds"."status" IN ('running'))
AND ci_builds.updated_at < '2021-06-23 14:07:20.385873' limit 100;
Limit (cost=0.42..145.19 rows=100 width=1282)
-> Index Scan using index_ci_builds_runner_id_running on ci_builds (cost=0.42..429826.47 rows=296912 width=1282)
Filter: (updated_at < '2021-06-23 14:07:20.385873'::timestamp without time zone)
Time: 9.658 min
- planning: 24.088 ms
- execution: 9.658 min
- I/O read: 9.489 min
- I/O write: 0.000 ms
Shared buffers:
- hits: 129456 (~1011.40 MiB) from the buffer pool
- reads: 433393 (~3.30 GiB) from the OS file cache, including disk I/O
- dirtied: 159093 (~1.20 GiB)
- writes: 0
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/4766/commands/16979
New Query
explain SELECT "ci_builds".*
FROM "ci_builds"
WHERE "ci_builds"."type" = 'Ci::Build'
AND ("ci_builds"."status" IN ('running'))
AND ((ci_builds.created_at BETWEEN '2021-05-30 15:07:20.385754' AND '2021-06-04 14:07:20.385873')
AND (ci_builds.updated_at BETWEEN '2021-05-30 15:07:20.385754' AND '2021-06-04 14:07:20.385873'))
limit 100;
Explain Plan
Limit (cost=0.58..3560.61 rows=26 width=1281) (actual time=0.078..4.541 rows=100 loops=1)
Buffers: shared hit=1070
I/O Timings: read=0.000 write=0.000
-> Index Scan using ci_builds_gitlab_monitor_metrics on public.ci_builds (cost=0.58..3560.61 rows=26 width=1281) (actual time=0.076..4.519 rows=100 loops=1)
Index Cond: (((ci_builds.status)::text = 'running'::text) AND (ci_builds.created_at >= '2021-05-30 15:07:20.385754'::timestamp without time zone) AND (ci_builds.created_at <= '2021-06-04 14:07:20.385873'::timestamp without time zone))
Filter: ((ci_builds.updated_at >= '2021-05-30 15:07:20.385754'::timestamp without time zone) AND (ci_builds.updated_at <= '2021-06-04 14:07:20.385873'::timestamp without time zone))
Rows Removed by Filter: 0
Buffers: shared hit=1070
I/O Timings: read=0.000 write=0.000
Cold Cache Perf
Time: 735.886 ms
- planning: 24.521 ms
- execution: 711.365 ms
- I/O read: 633.322 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 1198 (~9.40 MiB) from the buffer pool
- reads: 13338 (~104.20 MiB) from the OS file cache, including disk I/O
- dirtied: 2540 (~19.80 MiB)
- writes: 0
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/4460/commands/15565
Warm Cache Perf
Time: 5.155 ms
- planning: 0.568 ms
- execution: 4.587 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 1070 (~8.40 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/4460/commands/15566
Does this MR meet the acceptance criteria?
Conformity
-
I have included a changelog entry, or it's not needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Merge request reports
Activity
changed milestone to %14.0
assigned to @allison.browne
- A deleted user
added backend label
1 Message 📖 This merge request adds or changes files that require a review from the Database team. This merge request requires a database review. To make sure these changes are reviewed, take the following steps:
- Ensure the merge request has database and databasereview pending labels. If the merge request modifies database files, Danger will do this for you.
- Prepare your MR for database review according to the docs.
- Assign and mention the database reviewer suggested by Reviewer Roulette.
Reviewer roulette
Changes that require review have been detected! A merge request is normally reviewed by both a reviewer and a maintainer in its primary category (e.g. frontend or backend), and by a maintainer in all other categories.
To spread load more evenly across eligible reviewers, Danger has picked a candidate for each review slot, based on their timezone. Feel free to override these selections if you think someone else would be better-suited or use the GitLab Review Workload Dashboard to find other available reviewers.
To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines. Please consider assigning a reviewer or maintainer who is a domain expert in the area of the merge request.
Once you've decided who will review this merge request, assign them as a reviewer! Danger does not automatically notify them for you.
Category Reviewer Maintainer backend Tyler Amos ( @tyleramos
) (UTC-4, same timezone as@allison.browne
)Etienne Baqué ( @ebaque
) (UTC+2, 6 hours ahead of@allison.browne
)database Alex Ives ( @alexives
) (UTC-5, 1 hour behind@allison.browne
)Mayra Cabrera ( @mayra-cabrera
) (UTC-5, 1 hour behind@allison.browne
)If needed, you can retry the
danger-review
job that generated this comment.Generated by
🚫 Dangerremoved backend label
marked the checklist item I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) as completed
marked the checklist item I have tested this MR in all supported browsers, or it's not needed. as completed
marked the checklist item I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed. as completed
marked the checklist item I have followed the style guides. as completed
marked the checklist item This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) as completed
marked the checklist item I have self-reviewed this MR per code review guidelines. as completed
marked the checklist item I have included a changelog entry, or it's not needed. (Does this MR need a changelog?) as completed
marked the checklist item I have added/updated documentation, or it's not needed. (Is documentation required?) as completed
marked the checklist item I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) as completed
marked the checklist item I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) as completed
added devopsverify grouppipeline execution typemaintenance labels
added typefeature label
added bugperformance label