ProjectsFinder produces SQL queries that are overly complex when filtering by starred projects
When looking into https://gitlab.com/gitlab-org/gitlab-ce/issues/31806#note_32116558 I ran into the following code:
ProjectsFinder.new(current_user: current_user, params: { starred: true }).execute
This ends up producing a query along the lines of the following:
SELECT projects.*
FROM projects
WHERE projects.pending_delete = 'f'
AND (
projects.id IN (
SELECT projects.id
FROM projects
INNER JOIN users_star_projects ON users_star_projects.project_id = projects.id
INNER JOIN project_authorizations ON projects.id = project_authorizations.project_id
WHERE projects.pending_delete = 'f'
AND project_authorizations.user_id = 1
AND users_star_projects.user_id = 1
UNION
SELECT projects.id
FROM projects
INNER JOIN users_star_projects ON users_star_projects.project_id = projects.id
WHERE projects.visibility_level IN (20, 10)
AND users_star_projects.user_id = 1
)
)
ORDER BY projects.id DESC;
This will produce the following plan:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=68.50..68.51 rows=2 width=517) (actual time=0.110..0.110 rows=2 loops=1)
Sort Key: projects.id DESC
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=51.98..68.49 rows=2 width=517) (actual time=0.089..0.096 rows=2 loops=1)
-> Unique (cost=51.55..51.56 rows=2 width=4) (actual time=0.083..0.085 rows=2 loops=1)
-> Sort (cost=51.55..51.55 rows=2 width=4) (actual time=0.082..0.082 rows=2 loops=1)
Sort Key: projects_1.id
Sort Method: quicksort Memory: 25kB
-> Append (cost=1.15..51.54 rows=2 width=4) (actual time=0.039..0.072 rows=2 loops=1)
-> Nested Loop (cost=1.15..26.27 rows=1 width=4) (actual time=0.039..0.059 rows=1 loops=1)
Join Filter: (users_star_projects.project_id = project_authorizations.project_id)
-> Nested Loop (cost=0.72..25.24 rows=2 width=8) (actual time=0.024..0.038 rows=2 loops=1)
-> Index Only Scan using index_users_star_projects_on_user_id_and_project_id on users_star_projects (cost=0.29..8.33 rows=2 width=4) (actual time=0.011..0.012 rows=2 loops=1)
Index Cond: (user_id = 1)
Heap Fetches: 0
-> Index Scan using projects_pkey on projects projects_1 (cost=0.43..8.45 rows=1 width=4) (actual time=0.011..0.011 rows=1 loops=2)
Index Cond: (id = users_star_projects.project_id)
Filter: (NOT pending_delete)
-> Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations (cost=0.43..0.50 rows=1 width=4) (actual time=0.007..0.007 rows=0 loops=2)
Index Cond: ((user_id = 1) AND (project_id = projects_1.id))
Heap Fetches: 0
-> Nested Loop (cost=0.72..25.25 rows=1 width=4) (actual time=0.010..0.012 rows=1 loops=1)
-> Index Only Scan using index_users_star_projects_on_user_id_and_project_id on users_star_projects users_star_projects_1 (cost=0.29..8.33 rows=2 width=4) (actual time=0.001..0.001 rows=2 loops=1)
Index Cond: (user_id = 1)
Heap Fetches: 0
-> Index Scan using projects_pkey on projects projects_2 (cost=0.43..8.45 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=2)
Index Cond: (id = users_star_projects_1.project_id)
Filter: (visibility_level = ANY ('{20,10}'::integer[]))
Rows Removed by Filter: 0
-> Index Scan using projects_pkey on projects (cost=0.43..8.45 rows=1 width=517) (actual time=0.004..0.004 rows=1 loops=2)
Index Cond: (id = projects_1.id)
Filter: (NOT pending_delete)
Planning time: 1.283 ms
Execution time: 0.274 ms
Measuring this query with \timing
produces a query time of roughly 3 ms.
We can reduce this query down the following instead:
SELECT projects.*
FROM projects
INNER JOIN users_star_projects ON users_star_projects.project_id = projects.id
WHERE projects.pending_delete = 'f'
AND users_star_projects.user_id = 1
AND (
EXISTS (
SELECT true
FROM project_authorizations
WHERE project_authorizations.project_id = projects.id
AND project_authorizations.user_id = users_star_projects.user_id
)
OR projects.visibility_level IN (20, 10)
)
ORDER BY projects.id DESC;
This produces the following plan:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.72..42.15 rows=1 width=517) (actual time=0.361..0.524 rows=2 loops=1)
-> Index Only Scan Backward using index_users_star_projects_on_user_id_and_project_id on users_star_projects (cost=0.29..8.33 rows=2 width=8) (actual time=0.012..0.015 rows=2 loops=1)
Index Cond: (user_id = 1)
Heap Fetches: 0
-> Index Scan using projects_pkey on projects (cost=0.43..16.90 rows=1 width=517) (actual time=0.245..0.252 rows=1 loops=2)
Index Cond: (id = users_star_projects.project_id)
Filter: ((NOT pending_delete) AND ((SubPlan 1) OR (visibility_level = ANY ('{20,10}'::integer[]))))
SubPlan 1
-> Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations (cost=0.43..8.45 rows=1 width=0) (actual time=0.034..0.034 rows=0 loops=2)
Index Cond: ((user_id = users_star_projects.user_id) AND (project_id = projects.id))
Heap Fetches: 0
Planning time: 2.082 ms
Execution time: 0.662 ms
(13 rows)
This query in turn will run in about 1.5 milliseconds, varying a bit over time.
While the timing difference is very small we can clearly see we're using a much more simple plan here. It's possible to further optimize this by removing the pending_delete filter (https://gitlab.com/gitlab-org/gitlab-ce/issues/30708) and maybe messing around with the visibility level filter (though I can't think of any immediate improvements).