improve count projects query performance
Queries of the form:
SELECT COUNT(*) FROM "projects" WHERE "projects"."pending_delete" = ? AND (projects.id IN (SELECT "projects"."id" FROM "projects" INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id" WHERE "projects"."pending_delete" = ? AND "project_authorizations"."user_id" = ? UNION SELECT "projects"."id" FROM "projects" WHERE "projects"."visibility_level" IN (?, ?)));
are relatively slow, in part because the UNION forces a unique requirement to be implemented on the query inside of the IN clause, but also because the IN clause itself tends to lead to inefficient query plans as a list has to be constructed and then used. PostgreSQL chooses to build a Hash of the project ids from the IN query and then scan the projects table, but the ends up spending a lot of time throwing away records in the projects table that don't match.
What isn't obvious to the PostgreSQL planner with the way this query is constructed is that the same answer can be achieved by pulling the records with a visibility_level in the set (presumably the public ones) and then adding it to the projects which are available to the user (and not already visible due to being public). Reconstructing the query in this way proves to be much more efficient:
select
(select count(*) from projects where visibility_level in (?, ?)) +
(select count(*) from projects where visibility_level not in (?, ?)
and exists (select 1 from project_authorizations auth where auth.project_id = projects.id and auth.user_id = ?));
This query is ~20x faster, dropping the time required from 3s to ~170ms. All of the accesses are through Index Only scans across the projects and project_authorization tables with very little filtering happening.
The performance of this specific query might be further improved by reducing the width of the projects table, to reduce the number of pages which have to be walked to count the public projects.
This query should be analyzed in the context of the application to ensure that the results being returned are, indeed, correct.
Further improvement to this query is possible by using a materialized view to maintain, in an easily and cheaply retrieved fashion, the number of public projects. This would impose a cost to adding new public projects, but that may be worthwhile as new projects are likely not created nearly as frequently as this query is run and the additional cost would be both understood and accepted by users.