SQL query for counting open TODOs is overly complex
The query used for counting the open TODOs currently looks like the following:
SELECT COUNT(*)
FROM todos
INNER JOIN projects ON projects.id = todos.project_id
WHERE todos.user_id = 1
AND todos.state IN ('pending')
AND (
EXISTS (
SELECT 1
FROM project_authorizations
WHERE project_authorizations.user_id = 1
AND project_authorizations.project_id = projects.id
)
OR projects.visibility_level IN (0,10,20)
)
AND projects.id IN (
SELECT todos.project_id
FROM todos
WHERE todos.user_id = 1
AND todos.state IN ('pending')
);
On GitLab.com this produces the following plan on the 1st call:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=9384.91..9384.92 rows=1 width=8) (actual time=1178.523..1178.523 rows=1 loops=1)
-> Nested Loop (cost=4300.60..9384.91 rows=1 width=0) (actual time=1161.140..1178.114 rows=1358 loops=1)
-> Hash Join (cost=4300.17..8593.95 rows=89 width=8) (actual time=1161.071..1168.195 rows=1358 loops=1)
Hash Cond: (todos.project_id = todos_1.project_id)
-> Index Scan using index_todos_on_user_id on todos (cost=0.43..4286.68 rows=528 width=4) (actual time=828.196..834.380 rows=1358 loops=1)
Index Cond: (user_id = 1)
Filter: ((state)::text = 'pending'::text)
Rows Removed by Filter: 1480
-> Hash (cost=4293.22..4293.22 rows=522 width=4) (actual time=332.859..332.859 rows=54 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> HashAggregate (cost=4288.00..4293.22 rows=522 width=4) (actual time=332.812..332.834 rows=54 loops=1)
Group Key: todos_1.project_id
-> Index Scan using index_todos_on_user_id on todos todos_1 (cost=0.43..4286.68 rows=528 width=4) (actual time=0.022..331.988 rows=1358 loops=1)
Index Cond: (user_id = 1)
Filter: ((state)::text = 'pending'::text)
Rows Removed by Filter: 1480
-> Index Scan using projects_pkey on projects (cost=0.43..8.88 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=1358)
Index Cond: (id = todos.project_id)
Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (visibility_level = ANY ('{0,10,20}'::integer[])))
SubPlan 1
-> Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations (cost=0.43..4.45 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1358)
Index Cond: ((user_id = 1) AND (project_id = projects.id))
Heap Fetches: 1
SubPlan 2
-> Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations project_authorizations_1 (cost=0.43..7.59 rows=74 width=4) (never executed)
Index Cond: (user_id = 1)
Heap Fetches: 0
Planning time: 1.110 ms
Execution time: 1178.602 ms
If you run the same query again the plan is:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=9384.91..9384.92 rows=1 width=8) (actual time=19.075..19.075 rows=1 loops=1)
-> Nested Loop (cost=4300.60..9384.91 rows=1 width=0) (actual time=6.037..18.625 rows=1358 loops=1)
-> Hash Join (cost=4300.17..8593.95 rows=89 width=8) (actual time=6.013..10.138 rows=1358 loops=1)
Hash Cond: (todos.project_id = todos_1.project_id)
-> Index Scan using index_todos_on_user_id on todos (cost=0.43..4286.68 rows=528 width=4) (actual time=0.024..3.278 rows=1358 loops=1)
Index Cond: (user_id = 1)
Filter: ((state)::text = 'pending'::text)
Rows Removed by Filter: 1480
-> Hash (cost=4293.22..4293.22 rows=522 width=4) (actual time=5.982..5.982 rows=54 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> HashAggregate (cost=4288.00..4293.22 rows=522 width=4) (actual time=5.945..5.965 rows=54 loops=1)
Group Key: todos_1.project_id
-> Index Scan using index_todos_on_user_id on todos todos_1 (cost=0.43..4286.68 rows=528 width=4) (actual time=0.012..5.411 rows=1358 loops=1)
Index Cond: (user_id = 1)
Filter: ((state)::text = 'pending'::text)
Rows Removed by Filter: 1480
-> Index Scan using projects_pkey on projects (cost=0.43..8.88 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=1358)
Index Cond: (id = todos.project_id)
Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (visibility_level = ANY ('{0,10,20}'::integer[])))
SubPlan 1
-> Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations (cost=0.43..4.45 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1358)
Index Cond: ((user_id = 1) AND (project_id = projects.id))
Heap Fetches: 1
SubPlan 2
-> Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations project_authorizations_1 (cost=0.43..7.59 rows=74 width=4) (never executed)
Index Cond: (user_id = 1)
Heap Fetches: 0
Planning time: 0.787 ms
Execution time: 19.145 ms
There are some problems with this query:
- The
OR projects.visibility_leveleffectively nullifies the authorization check because its result includes public, internal, and private projects - The
WHERE projects.id IN ( ... )is completely redundant
We can instead just use the following query to achieve the same value in a little bit less time:
SELECT COUNT(*)
FROM todos
INNER JOIN projects ON projects.id = todos.project_id
WHERE todos.user_id = 1
AND todos.state IN ('pending')
AND (
EXISTS (
SELECT 1
FROM project_authorizations
WHERE project_authorizations.user_id = 1
AND project_authorizations.project_id = projects.id
)
OR projects.visibility_level IN (10, 20)
);
This produces plan:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=8980.46..8980.47 rows=1 width=8) (actual time=14.295..14.296 rows=1 loops=1)
-> Nested Loop (cost=0.86..8979.14 rows=528 width=0) (actual time=0.059..13.876 rows=1358 loops=1)
-> Index Scan using index_todos_on_user_id on todos (cost=0.43..4286.68 rows=528 width=4) (actual time=0.035..5.652 rows=1358 loops=1)
Index Cond: (user_id = 1)
Filter: ((state)::text = 'pending'::text)
Rows Removed by Filter: 1480
-> Index Scan using projects_pkey on projects (cost=0.43..8.88 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=1358)
Index Cond: (id = todos.project_id)
Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (visibility_level = ANY ('{0,10,20}'::integer[])))
SubPlan 1
-> Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations (cost=0.43..4.45 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1358)
Index Cond: ((user_id = 1) AND (project_id = projects.id))
Heap Fetches: 1
SubPlan 2
-> Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations project_authorizations_1 (cost=0.43..7.59 rows=74 width=4) (never executed)
Index Cond: (user_id = 1)
Heap Fetches: 0
Planning time: 0.462 ms
Execution time: 14.348 ms
We can take this a step further and just completely ignore the authorizations, producing:
SELECT COUNT(*)
FROM todos
WHERE user_id = 1
AND state = 'pending';
This produces plan:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=4288.00..4288.01 rows=1 width=8) (actual time=5.775..5.775 rows=1 loops=1)
-> Index Scan using index_todos_on_user_id on todos (cost=0.43..4286.68 rows=528 width=0) (actual time=0.029..5.365 rows=1358 loops=1)
Index Cond: (user_id = 1)
Filter: ((state)::text = 'pending'::text)
Rows Removed by Filter: 1480
Planning time: 0.074 ms
Execution time: 5.803 ms
There are some arguments in favour of this:
- Hiding todos for unauthorized projects means they will always remain in the pending state, meaning we always have to filter them out (since a user can't mark them as read)
- TODOs are only created when you have access to a project, this means that keeping pending TODOs around after you leave a project doesn't expose more information than you already knew
- Ironically this is what the current behaviour already does because of the
OR visibility_level, so we wouldn't change behaviour and still increase performance
If we do want to include authorizations (and get the proper data) we'd have to use this query instead:
SELECT COUNT(*)
FROM todos
INNER JOIN projects ON projects.id = todos.project_id
WHERE todos.user_id = 1
AND todos.state IN ('pending')
AND (
EXISTS (
SELECT 1
FROM project_authorizations
WHERE project_authorizations.user_id = 1
AND project_authorizations.project_id = projects.id
)
OR projects.visibility_level IN (10, 20)
);