Skip to content

Bad performance of queries for milestones filtering

The fix https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/23098#note_125777941 changes the milestones filter. The corresponding query:

--explain (analyze, buffers)
SELECT
  issues.*,
  (
    SELECT MIN("label_priorities"."priority")
    FROM "labels"
    INNER JOIN "label_links" ON "label_links"."label_id" = "labels"."id"
    LEFT OUTER JOIN "label_priorities" ON "labels"."id" = "label_priorities"."label_id"
    WHERE
      (label_priorities.project_id = issues.project_id)
      AND (label_links.target_id = issues.id)
      AND "label_links"."target_type" = 'Issue'
  ) AS highest_priority,
  MIN(milestones.due_date)
FROM "issues"
INNER JOIN "projects" ON "projects"."id" = "issues"."project_id"
LEFT JOIN project_features ON projects.id = project_features.project_id
LEFT OUTER JOIN milestones ON issues.milestone_id = milestones.id
WHERE 
  (
    EXISTS (
      SELECT 1
      FROM "project_authorizations"
      WHERE
        "project_authorizations"."user_id" = 1
        AND (project_authorizations.project_id = projects.id)
    ) OR projects.visibility_level IN (0,10,20)
  ) AND (
    "project_features"."issues_access_level" IN (10, 20, 30)
    OR "project_features"."issues_access_level" IS NULL
  ) AND ("issues"."state" IN ('opened'))
  AND "projects"."archived" = false --$1
  AND "issues"."milestone_id" IN (
    SELECT DISTINCT ON (project_id, group_id) id
    FROM "milestones"
    WHERE "milestones"."project_id" IN (
      SELECT "projects"."id"
      FROM "projects"
      LEFT JOIN project_features ON projects.id = project_features.project_id
      WHERE
        (
          EXISTS (
            SELECT 1
            FROM "project_authorizations"
            WHERE
              "project_authorizations"."user_id" = 1
              AND (project_authorizations.project_id = projects.id)
          ) OR projects.visibility_level IN (0,10,20)
        ) AND (
          "project_features"."issues_access_level" IN (10, 20, 30)
          OR "project_features"."issues_access_level" IS NULL
        )
      ) AND ("milestones"."state" IN ('active')
    ) AND (due_date > '2019-01-09 06:35:15.330524')
    ORDER BY
      "milestones"."project_id" ASC,
      "milestones"."group_id" ASC,
      "milestones"."due_date" ASC
  ) AND "projects"."archived" = false --$2
GROUP BY "issues"."id"
ORDER BY
  MIN(milestones.due_date) ASC NULLS LAST,
  highest_priority ASC NULLS LAST,
  "issues"."id" DESC
LIMIT 25 --$3
OFFSET 0 --$4
;

Its plan:

                                                                                                                                                  QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
 Limit  (cost=11469955.64..11469955.70 rows=25 width=754) (actual time=1164.845..1164.871 rows=25 loops=1)
   Buffers: shared hit=647915, temp read=3154 written=3152
   ->  Sort  (cost=11469955.64..11471108.27 rows=461052 width=754) (actual time=1164.844..1164.849 rows=25 loops=1)
         Sort Key: (min(milestones.due_date)), ((SubPlan 1)), issues.id DESC
         Sort Method: top-N heapsort  Memory: 47kB
         Buffers: shared hit=647915, temp read=3154 written=3152
         ->  GroupAggregate  (cost=5195913.39..11456945.08 rows=461052 width=754) (actual time=841.752..1151.577 rows=19141 loops=1)
               Group Key: issues.id
               Buffers: shared hit=647915, temp read=3154 written=3152
               ->  Sort  (cost=5195913.39..5197066.02 rows=461052 width=750) (actual time=841.637..851.890 rows=19141 loops=1)
                     Sort Key: issues.id DESC
                     Sort Method: external merge  Disk: 12728kB
                     Buffers: shared hit=378636, temp read=3154 written=3152
                     ->  Hash Left Join  (cost=133759.04..4846822.91 rows=461052 width=750) (actual time=438.771..785.141 rows=19141 loops=1)
                           Hash Cond: (issues.milestone_id = milestones.id)
                           Buffers: shared hit=378636, temp read=1562 written=1560
                           ->  Nested Loop Left Join  (cost=53813.87..4406857.16 rows=461052 width=746) (actual time=139.862..416.661 rows=19141 loops=1)
                                 Filter: ((project_features.issues_access_level = ANY ('{10,20,30}'::integer[])) OR (project_features.issues_access_level IS NULL))
                                 Buffers: shared hit=363302
                                 ->  Nested Loop  (cost=53813.44..3465909.08 rows=469820 width=750) (actual time=139.848..356.084 rows=19141 loops=1)
                                       Buffers: shared hit=286561
                                       ->  Nested Loop  (cost=53813.01..62224.60 rows=512506 width=746) (actual time=139.811..250.203 rows=19238 loops=1)
                                             Buffers: shared hit=132686
                                             ->  HashAggregate  (cost=53812.45..53814.45 rows=200 width=4) (actual time=139.781..141.978 rows=3613 loops=1)
                                                   Group Key: milestones_1.id
                                                   Buffers: shared hit=89076
                                                   ->  Unique  (cost=53715.89..53752.10 rows=4828 width=16) (actual time=137.191..138.830 rows=3613 loops=1)
                                                         Buffers: shared hit=89076
                                                         ->  Sort  (cost=53715.89..53727.96 rows=4828 width=16) (actual time=137.189..137.862 rows=6679 loops=1)
                                                               Sort Key: milestones_1.project_id, milestones_1.group_id, milestones_1.due_date
                                                               Sort Method: quicksort  Memory: 506kB
                                                               Buffers: shared hit=89076
                                                               ->  Nested Loop Semi Join  (cost=1.29..53420.48 rows=4828 width=16) (actual time=0.099..133.087 rows=6679 loops=1)
                                                                     Buffers: shared hit=89076
                                                                     ->  Index Scan using index_milestones_on_due_date on milestones milestones_1  (cost=0.42..5323.84 rows=5083 width=16) (actual time=0.02
4..17.858 rows=8079 loops=1)
                                                                           Index Cond: (due_date > '2019-01-09'::date)
                                                                           Filter: ((state)::text = 'active'::text)
                                                                           Rows Removed by Filter: 197
                                                                           Buffers: shared hit=8255
                                                                     ->  Nested Loop Left Join  (cost=0.86..9.45 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=8079)
                                                                           Filter: ((project_features_1.issues_access_level = ANY ('{10,20,30}'::integer[])) OR (project_features_1.issues_access_level IS N
ULL))
                                                                           Rows Removed by Filter: 0
                                                                           Buffers: shared hit=80821
                                                                           ->  Index Scan using projects_pkey on projects projects_1  (cost=0.43..8.94 rows=1 width=4) (actual time=0.008..0.008 rows=1 loop
s=8079)
                                                                                 Index Cond: (id = milestones_1.project_id)
                                                                                 Filter: ((alternatives: SubPlan 4 or hashed SubPlan 5) OR (visibility_level = ANY ('{0,10,20}'::integer[])))
                                                                                 Buffers: shared hit=53918
                                                                                 SubPlan 4
                                                                                   ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations proje
ct_authorizations_2  (cost=0.56..4.58 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=6723)
                                                                                         Index Cond: ((user_id = 1) AND (project_id = projects_1.id))
                                                                                         Heap Fetches: 14
                                                                                         Buffers: shared hit=27026
                                                                                 SubPlan 5
                                                                                   ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations proje
ct_authorizations_3  (cost=0.56..815.81 rows=1565 width=4) (never executed)
                                                                                         Index Cond: (user_id = 1)
                                                                                         Heap Fetches: 0
                                                                           ->  Index Scan using index_project_features_on_project_id on project_features project_features_1  (cost=0.43..0.50 rows=1 width=8
) (actual time=0.006..0.006 rows=1 loops=6723)
                                                                                 Index Cond: (projects_1.id = project_id)
                                                                                 Buffers: shared hit=26903
                                             ->  Index Scan using index_issues_on_milestone_id on issues  (cost=0.56..41.97 rows=8 width=746) (actual time=0.010..0.029 rows=5 loops=3613)
                                                   Index Cond: (milestone_id = milestones_1.id)
                                                   Filter: ((state)::text = 'opened'::text)
                                                   Rows Removed by Filter: 3
                                                   Buffers: shared hit=43610
                                       ->  Index Scan using projects_pkey on projects  (cost=0.43..6.63 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=19238)
                                             Index Cond: (id = issues.project_id)
                                             Filter: ((NOT archived) AND (NOT archived) AND ((alternatives: SubPlan 2 or hashed SubPlan 3) OR (visibility_level = ANY ('{0,10,20}'::integer[]))))
                                             Rows Removed by Filter: 0
                                             Buffers: shared hit=153875
                                             SubPlan 2
                                               ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations  (cost=0.56..4.58 rows=1 width=0) (actual
 time=0.002..0.002 rows=0 loops=19141)
                                                     Index Cond: ((user_id = 1) AND (project_id = projects.id))
                                                     Heap Fetches: 6
                                                     Buffers: shared hit=76807
                                             SubPlan 3
                                               ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations project_authorizations_1  (cost=0.56..815
.81 rows=1565 width=4) (never executed)
                                                     Index Cond: (user_id = 1)
                                                     Heap Fetches: 0
                                 ->  Index Scan using index_project_features_on_project_id on project_features  (cost=0.43..1.99 rows=1 width=8) (actual time=0.002..0.003 rows=1 loops=19141)
                                       Index Cond: (projects.id = project_id)
                                       Buffers: shared hit=76741
                           ->  Hash  (cost=66216.52..66216.52 rows=488052 width=8) (actual time=295.533..295.533 rows=488052 loops=1)
                                 Buckets: 524288  Batches: 2  Memory Usage: 13231kB
                                 Buffers: shared hit=15334, temp written=775
                                 ->  Seq Scan on milestones  (cost=0.00..66216.52 rows=488052 width=8) (actual time=0.017..159.779 rows=488052 loops=1)
                                       Buffers: shared hit=15334
               SubPlan 1
                 ->  Aggregate  (cost=13.55..13.56 rows=1 width=4) (actual time=0.015..0.015 rows=1 loops=19141)
                       Buffers: shared hit=269279
                       ->  Nested Loop  (cost=1.28..13.55 rows=1 width=4) (actual time=0.013..0.014 rows=0 loops=19141)
                             Join Filter: (label_links.label_id = label_priorities.label_id)
                             Buffers: shared hit=269279
                             ->  Nested Loop  (cost=0.99..12.90 rows=2 width=8) (actual time=0.008..0.011 rows=2 loops=19141)
                                   Buffers: shared hit=205898
                                   ->  Index Scan using index_label_links_on_target_id_and_target_type on label_links  (cost=0.56..5.98 rows=2 width=4) (actual time=0.005..0.006 rows=2 loops=19141)
                                         Index Cond: ((target_id = issues.id) AND ((target_type)::text = 'Issue'::text))
                                         Buffers: shared hit=99684
                                   ->  Index Only Scan using labels_pkey on labels  (cost=0.43..3.45 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=29684)
                                         Index Cond: (id = label_links.label_id)
                                         Heap Fetches: 5104
                                         Buffers: shared hit=106214
                             ->  Index Scan using index_label_priorities_on_label_id on label_priorities  (cost=0.29..0.31 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=29683)
                                   Index Cond: (label_id = labels.id)
                                   Filter: (project_id = issues.project_id)
                                   Rows Removed by Filter: 0
                                   Buffers: shared hit=63381
 Planning time: 9.271 ms
 Execution time: 1169.081 ms
(108 rows)

This is not the problem of this particular filter – the way how milestones are being retrieved now (using LEFT JOINs and EXISTS) is very inefficient (see https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/23098#note_127374351 and https://gitlab.com/gitlab-org/gitlab-ce/issues/49533).

The plan shows multiple problems – notice Seq Scan on milestones, amounts of data processed (Buffers: shared hit=89076), external merge Disk: 12728kB, temp read=1562 written=1560.

This query needs to be improved. DB schema refactoring (do we really need to have project_features table which is 1:1 to projects?), denormalization (can we propagate projects.visibitility_level to milestones to decrease amount of tables involved to the query) might be needed.

See also: https://gitlab.com/gitlab-org/gitlab-ce/issues/49533

Edited by Nikolay Samokhvalov