SQL query for counting open TODOs is overly complex

The query used for counting the open TODOs currently looks like the following:

SELECT COUNT(*)
FROM todos
INNER JOIN projects ON projects.id = todos.project_id
WHERE todos.user_id = 1
AND todos.state IN ('pending')
AND (
    EXISTS (
        SELECT 1
        FROM project_authorizations
        WHERE project_authorizations.user_id = 1
        AND project_authorizations.project_id = projects.id
    ) 
    OR projects.visibility_level IN (0,10,20)
)
AND projects.id IN (
    SELECT todos.project_id
    FROM todos
    WHERE todos.user_id = 1
    AND todos.state IN ('pending')
);

On GitLab.com this produces the following plan on the 1st call:

                                                                                                     QUERY PLAN                                                                                                     
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=9384.91..9384.92 rows=1 width=8) (actual time=1178.523..1178.523 rows=1 loops=1)
   ->  Nested Loop  (cost=4300.60..9384.91 rows=1 width=0) (actual time=1161.140..1178.114 rows=1358 loops=1)
         ->  Hash Join  (cost=4300.17..8593.95 rows=89 width=8) (actual time=1161.071..1168.195 rows=1358 loops=1)
               Hash Cond: (todos.project_id = todos_1.project_id)
               ->  Index Scan using index_todos_on_user_id on todos  (cost=0.43..4286.68 rows=528 width=4) (actual time=828.196..834.380 rows=1358 loops=1)
                     Index Cond: (user_id = 1)
                     Filter: ((state)::text = 'pending'::text)
                     Rows Removed by Filter: 1480
               ->  Hash  (cost=4293.22..4293.22 rows=522 width=4) (actual time=332.859..332.859 rows=54 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 10kB
                     ->  HashAggregate  (cost=4288.00..4293.22 rows=522 width=4) (actual time=332.812..332.834 rows=54 loops=1)
                           Group Key: todos_1.project_id
                           ->  Index Scan using index_todos_on_user_id on todos todos_1  (cost=0.43..4286.68 rows=528 width=4) (actual time=0.022..331.988 rows=1358 loops=1)
                                 Index Cond: (user_id = 1)
                                 Filter: ((state)::text = 'pending'::text)
                                 Rows Removed by Filter: 1480
         ->  Index Scan using projects_pkey on projects  (cost=0.43..8.88 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=1358)
               Index Cond: (id = todos.project_id)
               Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (visibility_level = ANY ('{0,10,20}'::integer[])))
               SubPlan 1
                 ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations  (cost=0.43..4.45 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1358)
                       Index Cond: ((user_id = 1) AND (project_id = projects.id))
                       Heap Fetches: 1
               SubPlan 2
                 ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations project_authorizations_1  (cost=0.43..7.59 rows=74 width=4) (never executed)
                       Index Cond: (user_id = 1)
                       Heap Fetches: 0
 Planning time: 1.110 ms
 Execution time: 1178.602 ms

If you run the same query again the plan is:

                                                                                                     QUERY PLAN                                                                                                     
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=9384.91..9384.92 rows=1 width=8) (actual time=19.075..19.075 rows=1 loops=1)
   ->  Nested Loop  (cost=4300.60..9384.91 rows=1 width=0) (actual time=6.037..18.625 rows=1358 loops=1)
         ->  Hash Join  (cost=4300.17..8593.95 rows=89 width=8) (actual time=6.013..10.138 rows=1358 loops=1)
               Hash Cond: (todos.project_id = todos_1.project_id)
               ->  Index Scan using index_todos_on_user_id on todos  (cost=0.43..4286.68 rows=528 width=4) (actual time=0.024..3.278 rows=1358 loops=1)
                     Index Cond: (user_id = 1)
                     Filter: ((state)::text = 'pending'::text)
                     Rows Removed by Filter: 1480
               ->  Hash  (cost=4293.22..4293.22 rows=522 width=4) (actual time=5.982..5.982 rows=54 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 10kB
                     ->  HashAggregate  (cost=4288.00..4293.22 rows=522 width=4) (actual time=5.945..5.965 rows=54 loops=1)
                           Group Key: todos_1.project_id
                           ->  Index Scan using index_todos_on_user_id on todos todos_1  (cost=0.43..4286.68 rows=528 width=4) (actual time=0.012..5.411 rows=1358 loops=1)
                                 Index Cond: (user_id = 1)
                                 Filter: ((state)::text = 'pending'::text)
                                 Rows Removed by Filter: 1480
         ->  Index Scan using projects_pkey on projects  (cost=0.43..8.88 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=1358)
               Index Cond: (id = todos.project_id)
               Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (visibility_level = ANY ('{0,10,20}'::integer[])))
               SubPlan 1
                 ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations  (cost=0.43..4.45 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1358)
                       Index Cond: ((user_id = 1) AND (project_id = projects.id))
                       Heap Fetches: 1
               SubPlan 2
                 ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations project_authorizations_1  (cost=0.43..7.59 rows=74 width=4) (never executed)
                       Index Cond: (user_id = 1)
                       Heap Fetches: 0
 Planning time: 0.787 ms
 Execution time: 19.145 ms

There are some problems with this query:

  1. The OR projects.visibility_level effectively nullifies the authorization check because its result includes public, internal, and private projects
  2. The WHERE projects.id IN ( ... ) is completely redundant

We can instead just use the following query to achieve the same value in a little bit less time:

SELECT COUNT(*)
FROM todos
INNER JOIN projects ON projects.id = todos.project_id
WHERE todos.user_id = 1
AND todos.state IN ('pending')
AND (
    EXISTS (
        SELECT 1
        FROM project_authorizations
        WHERE project_authorizations.user_id = 1
        AND project_authorizations.project_id = projects.id
    ) 
    OR projects.visibility_level IN (10, 20)
);

This produces plan:

                                                                                                     QUERY PLAN                                                                                                     
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=8980.46..8980.47 rows=1 width=8) (actual time=14.295..14.296 rows=1 loops=1)
   ->  Nested Loop  (cost=0.86..8979.14 rows=528 width=0) (actual time=0.059..13.876 rows=1358 loops=1)
         ->  Index Scan using index_todos_on_user_id on todos  (cost=0.43..4286.68 rows=528 width=4) (actual time=0.035..5.652 rows=1358 loops=1)
               Index Cond: (user_id = 1)
               Filter: ((state)::text = 'pending'::text)
               Rows Removed by Filter: 1480
         ->  Index Scan using projects_pkey on projects  (cost=0.43..8.88 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=1358)
               Index Cond: (id = todos.project_id)
               Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (visibility_level = ANY ('{0,10,20}'::integer[])))
               SubPlan 1
                 ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations  (cost=0.43..4.45 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1358)
                       Index Cond: ((user_id = 1) AND (project_id = projects.id))
                       Heap Fetches: 1
               SubPlan 2
                 ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations project_authorizations_1  (cost=0.43..7.59 rows=74 width=4) (never executed)
                       Index Cond: (user_id = 1)
                       Heap Fetches: 0
 Planning time: 0.462 ms
 Execution time: 14.348 ms

We can take this a step further and just completely ignore the authorizations, producing:

SELECT COUNT(*)
FROM todos 
WHERE user_id = 1 
AND state = 'pending';

This produces plan:

                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4288.00..4288.01 rows=1 width=8) (actual time=5.775..5.775 rows=1 loops=1)
   ->  Index Scan using index_todos_on_user_id on todos  (cost=0.43..4286.68 rows=528 width=0) (actual time=0.029..5.365 rows=1358 loops=1)
         Index Cond: (user_id = 1)
         Filter: ((state)::text = 'pending'::text)
         Rows Removed by Filter: 1480
 Planning time: 0.074 ms
 Execution time: 5.803 ms

There are some arguments in favour of this:

  1. Hiding todos for unauthorized projects means they will always remain in the pending state, meaning we always have to filter them out (since a user can't mark them as read)
  2. TODOs are only created when you have access to a project, this means that keeping pending TODOs around after you leave a project doesn't expose more information than you already knew
  3. Ironically this is what the current behaviour already does because of the OR visibility_level, so we wouldn't change behaviour and still increase performance

If we do want to include authorizations (and get the proper data) we'd have to use this query instead:

SELECT COUNT(*)
FROM todos
INNER JOIN projects ON projects.id = todos.project_id
WHERE todos.user_id = 1
AND todos.state IN ('pending')
AND (
    EXISTS (
        SELECT 1
        FROM project_authorizations
        WHERE project_authorizations.user_id = 1
        AND project_authorizations.project_id = projects.id
    ) 
    OR projects.visibility_level IN (10, 20)
);
Assignee Loading
Time tracking Loading