Bad performance of queries for milestones filtering
The fix https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/23098#note_125777941 changes the milestones filter. The corresponding query:
--explain (analyze, buffers)
SELECT
issues.*,
(
SELECT MIN("label_priorities"."priority")
FROM "labels"
INNER JOIN "label_links" ON "label_links"."label_id" = "labels"."id"
LEFT OUTER JOIN "label_priorities" ON "labels"."id" = "label_priorities"."label_id"
WHERE
(label_priorities.project_id = issues.project_id)
AND (label_links.target_id = issues.id)
AND "label_links"."target_type" = 'Issue'
) AS highest_priority,
MIN(milestones.due_date)
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
LEFT OUTER JOIN milestones ON issues.milestone_id = milestones.id
WHERE
(
EXISTS (
SELECT 1
FROM "project_authorizations"
WHERE
"project_authorizations"."user_id" = 1
AND (project_authorizations.project_id = projects.id)
) OR projects.visibility_level IN (0,10,20)
) AND (
"project_features"."issues_access_level" IN (10, 20, 30)
OR "project_features"."issues_access_level" IS NULL
) AND ("issues"."state" IN ('opened'))
AND "projects"."archived" = false --$1
AND "issues"."milestone_id" IN (
SELECT DISTINCT ON (project_id, group_id) id
FROM "milestones"
WHERE "milestones"."project_id" IN (
SELECT "projects"."id"
FROM "projects"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE
(
EXISTS (
SELECT 1
FROM "project_authorizations"
WHERE
"project_authorizations"."user_id" = 1
AND (project_authorizations.project_id = projects.id)
) OR projects.visibility_level IN (0,10,20)
) AND (
"project_features"."issues_access_level" IN (10, 20, 30)
OR "project_features"."issues_access_level" IS NULL
)
) AND ("milestones"."state" IN ('active')
) AND (due_date > '2019-01-09 06:35:15.330524')
ORDER BY
"milestones"."project_id" ASC,
"milestones"."group_id" ASC,
"milestones"."due_date" ASC
) AND "projects"."archived" = false --$2
GROUP BY "issues"."id"
ORDER BY
MIN(milestones.due_date) ASC NULLS LAST,
highest_priority ASC NULLS LAST,
"issues"."id" DESC
LIMIT 25 --$3
OFFSET 0 --$4
;
Its plan:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
Limit (cost=11469955.64..11469955.70 rows=25 width=754) (actual time=1164.845..1164.871 rows=25 loops=1)
Buffers: shared hit=647915, temp read=3154 written=3152
-> Sort (cost=11469955.64..11471108.27 rows=461052 width=754) (actual time=1164.844..1164.849 rows=25 loops=1)
Sort Key: (min(milestones.due_date)), ((SubPlan 1)), issues.id DESC
Sort Method: top-N heapsort Memory: 47kB
Buffers: shared hit=647915, temp read=3154 written=3152
-> GroupAggregate (cost=5195913.39..11456945.08 rows=461052 width=754) (actual time=841.752..1151.577 rows=19141 loops=1)
Group Key: issues.id
Buffers: shared hit=647915, temp read=3154 written=3152
-> Sort (cost=5195913.39..5197066.02 rows=461052 width=750) (actual time=841.637..851.890 rows=19141 loops=1)
Sort Key: issues.id DESC
Sort Method: external merge Disk: 12728kB
Buffers: shared hit=378636, temp read=3154 written=3152
-> Hash Left Join (cost=133759.04..4846822.91 rows=461052 width=750) (actual time=438.771..785.141 rows=19141 loops=1)
Hash Cond: (issues.milestone_id = milestones.id)
Buffers: shared hit=378636, temp read=1562 written=1560
-> Nested Loop Left Join (cost=53813.87..4406857.16 rows=461052 width=746) (actual time=139.862..416.661 rows=19141 loops=1)
Filter: ((project_features.issues_access_level = ANY ('{10,20,30}'::integer[])) OR (project_features.issues_access_level IS NULL))
Buffers: shared hit=363302
-> Nested Loop (cost=53813.44..3465909.08 rows=469820 width=750) (actual time=139.848..356.084 rows=19141 loops=1)
Buffers: shared hit=286561
-> Nested Loop (cost=53813.01..62224.60 rows=512506 width=746) (actual time=139.811..250.203 rows=19238 loops=1)
Buffers: shared hit=132686
-> HashAggregate (cost=53812.45..53814.45 rows=200 width=4) (actual time=139.781..141.978 rows=3613 loops=1)
Group Key: milestones_1.id
Buffers: shared hit=89076
-> Unique (cost=53715.89..53752.10 rows=4828 width=16) (actual time=137.191..138.830 rows=3613 loops=1)
Buffers: shared hit=89076
-> Sort (cost=53715.89..53727.96 rows=4828 width=16) (actual time=137.189..137.862 rows=6679 loops=1)
Sort Key: milestones_1.project_id, milestones_1.group_id, milestones_1.due_date
Sort Method: quicksort Memory: 506kB
Buffers: shared hit=89076
-> Nested Loop Semi Join (cost=1.29..53420.48 rows=4828 width=16) (actual time=0.099..133.087 rows=6679 loops=1)
Buffers: shared hit=89076
-> Index Scan using index_milestones_on_due_date on milestones milestones_1 (cost=0.42..5323.84 rows=5083 width=16) (actual time=0.02
4..17.858 rows=8079 loops=1)
Index Cond: (due_date > '2019-01-09'::date)
Filter: ((state)::text = 'active'::text)
Rows Removed by Filter: 197
Buffers: shared hit=8255
-> Nested Loop Left Join (cost=0.86..9.45 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=8079)
Filter: ((project_features_1.issues_access_level = ANY ('{10,20,30}'::integer[])) OR (project_features_1.issues_access_level IS N
ULL))
Rows Removed by Filter: 0
Buffers: shared hit=80821
-> Index Scan using projects_pkey on projects projects_1 (cost=0.43..8.94 rows=1 width=4) (actual time=0.008..0.008 rows=1 loop
s=8079)
Index Cond: (id = milestones_1.project_id)
Filter: ((alternatives: SubPlan 4 or hashed SubPlan 5) OR (visibility_level = ANY ('{0,10,20}'::integer[])))
Buffers: shared hit=53918
SubPlan 4
-> Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations proje
ct_authorizations_2 (cost=0.56..4.58 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=6723)
Index Cond: ((user_id = 1) AND (project_id = projects_1.id))
Heap Fetches: 14
Buffers: shared hit=27026
SubPlan 5
-> Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations proje
ct_authorizations_3 (cost=0.56..815.81 rows=1565 width=4) (never executed)
Index Cond: (user_id = 1)
Heap Fetches: 0
-> Index Scan using index_project_features_on_project_id on project_features project_features_1 (cost=0.43..0.50 rows=1 width=8
) (actual time=0.006..0.006 rows=1 loops=6723)
Index Cond: (projects_1.id = project_id)
Buffers: shared hit=26903
-> Index Scan using index_issues_on_milestone_id on issues (cost=0.56..41.97 rows=8 width=746) (actual time=0.010..0.029 rows=5 loops=3613)
Index Cond: (milestone_id = milestones_1.id)
Filter: ((state)::text = 'opened'::text)
Rows Removed by Filter: 3
Buffers: shared hit=43610
-> Index Scan using projects_pkey on projects (cost=0.43..6.63 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=19238)
Index Cond: (id = issues.project_id)
Filter: ((NOT archived) AND (NOT archived) AND ((alternatives: SubPlan 2 or hashed SubPlan 3) OR (visibility_level = ANY ('{0,10,20}'::integer[]))))
Rows Removed by Filter: 0
Buffers: shared hit=153875
SubPlan 2
-> Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations (cost=0.56..4.58 rows=1 width=0) (actual
time=0.002..0.002 rows=0 loops=19141)
Index Cond: ((user_id = 1) AND (project_id = projects.id))
Heap Fetches: 6
Buffers: shared hit=76807
SubPlan 3
-> Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations project_authorizations_1 (cost=0.56..815
.81 rows=1565 width=4) (never executed)
Index Cond: (user_id = 1)
Heap Fetches: 0
-> Index Scan using index_project_features_on_project_id on project_features (cost=0.43..1.99 rows=1 width=8) (actual time=0.002..0.003 rows=1 loops=19141)
Index Cond: (projects.id = project_id)
Buffers: shared hit=76741
-> Hash (cost=66216.52..66216.52 rows=488052 width=8) (actual time=295.533..295.533 rows=488052 loops=1)
Buckets: 524288 Batches: 2 Memory Usage: 13231kB
Buffers: shared hit=15334, temp written=775
-> Seq Scan on milestones (cost=0.00..66216.52 rows=488052 width=8) (actual time=0.017..159.779 rows=488052 loops=1)
Buffers: shared hit=15334
SubPlan 1
-> Aggregate (cost=13.55..13.56 rows=1 width=4) (actual time=0.015..0.015 rows=1 loops=19141)
Buffers: shared hit=269279
-> Nested Loop (cost=1.28..13.55 rows=1 width=4) (actual time=0.013..0.014 rows=0 loops=19141)
Join Filter: (label_links.label_id = label_priorities.label_id)
Buffers: shared hit=269279
-> Nested Loop (cost=0.99..12.90 rows=2 width=8) (actual time=0.008..0.011 rows=2 loops=19141)
Buffers: shared hit=205898
-> Index Scan using index_label_links_on_target_id_and_target_type on label_links (cost=0.56..5.98 rows=2 width=4) (actual time=0.005..0.006 rows=2 loops=19141)
Index Cond: ((target_id = issues.id) AND ((target_type)::text = 'Issue'::text))
Buffers: shared hit=99684
-> Index Only Scan using labels_pkey on labels (cost=0.43..3.45 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=29684)
Index Cond: (id = label_links.label_id)
Heap Fetches: 5104
Buffers: shared hit=106214
-> Index Scan using index_label_priorities_on_label_id on label_priorities (cost=0.29..0.31 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=29683)
Index Cond: (label_id = labels.id)
Filter: (project_id = issues.project_id)
Rows Removed by Filter: 0
Buffers: shared hit=63381
Planning time: 9.271 ms
Execution time: 1169.081 ms
(108 rows)
This is not the problem of this particular filter – the way how milestones are being retrieved now (using LEFT JOINs and EXISTS) is very inefficient (see https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/23098#note_127374351 and https://gitlab.com/gitlab-org/gitlab-ce/issues/49533).
The plan shows multiple problems – notice Seq Scan on milestones
, amounts of data processed (Buffers: shared hit=89076
), external merge Disk: 12728kB
, temp read=1562 written=1560
.
This query needs to be improved. DB schema refactoring (do we really need to have project_features
table which is 1:1 to projects
?), denormalization (can we propagate projects.visibitility_level
to milestones to decrease amount of tables involved to the query) might be needed.
See also: https://gitlab.com/gitlab-org/gitlab-ce/issues/49533