select events, BI query
This query has a structure along these lines:
SELECT *
FROM "events"
LEFT OUTER JOIN "projects" ON "projects"."id" = "events"."project_id" AND "projects"."pending_delete" = 'f'
LEFT OUTER JOIN "routes" ON "routes"."source_id" = "projects"."id" AND "routes"."source_type" = 'Project'
LEFT OUTER JOIN "namespaces" ON "namespaces"."id" = "projects"."namespace_id" AND "namespaces"."deleted_at" IS NULL
LEFT OUTER JOIN "users" ON "users"."id" = "events"."author_id"
WHERE
("events"."author_id" IS NOT NULL)
AND "events"."author_id" = ?
AND "projects"."pending_delete" = 'f'
AND (projects.id IN ( SELECT "projects"."id" FROM "projects"
INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id"
WHERE "projects"."pending_delete" = 'f' AND "project_authorizations"."user_id" = ?
UNION
SELECT "projects"."id" FROM "projects"
WHERE "projects"."visibility_level" IN (?, ?)))
ORDER BY "events"."id" DESC LIMIT ?
Unfortunately, this means that PostgreSQL basically builds up a big list of "projects the user has access to" into a hash table containing, in one example, some 287427 entries (most coming from the seemingly simple "select projects.id from projects where projects.visibility_level in (20, 10)" subquery which returns 287426 IDs). What we would much rather do is to step through the events for the specified author and extract the projects which are seen for that author and then pull out those projects, allowing us to avoid looking at or considering most of the projects which match the visibility_level IN (20, 10) constraint.
The below query reworks this by using an EXISTS instead of IN to check project_authorizations, avoiding the need to build up the big list of projects with visibility_level IN (20, 10):
select *
FROM
"events"
JOIN "projects" ON "projects"."id" = "events"."project_id" AND "projects"."pending_delete" = 'f'
LEFT OUTER JOIN "routes" ON "routes"."source_id" = "projects"."id" AND "routes"."source_type" = 'Project'
LEFT OUTER JOIN "namespaces" ON "namespaces"."id" = "projects"."namespace_id" AND "namespaces"."deleted_at" IS NULL
LEFT OUTER JOIN "users" ON "users"."id" = "events"."author_id"
WHERE
("events"."author_id" IS NOT NULL)
AND "events"."author_id" = ?
AND (projects.visibility_level in (?, ?)
OR
exists (select 1 from project_authorizations auth where user_id = ? and auth.project_id = projects.id))
order by events.id desc limit ?
Note that some of the LEFT OUTER JOINs were turned into INNER JOINs- that's because the WHERE clause included references to columns in those tables, which would mean that any rows not matching the join wouldn't be returned anyway, so there is no need to use a LEFT JOIN.
The above performs much better, but still involves extracting out all events for the author, even though this query is really looking for a "top count". If we adjust this query, in a similar manner recommended elsewhere, to use "created_at" instead of "id" for the sort and then add an index across (author_id, created_at), the query should pick up on that new index and use it to drive the entire query, which would also remove the need for PostgreSQL to perform a 'sort' step at the end of the query to pick out the top 20 and result in excellent performance.