Geo: Query for empty repository state is slow
I saw this query in production:
SELECT "projects"."id" FROM "projects" LEFT OUTER JOIN "routes" ON "routes"."source_id" = "projects"."id" AND "routes"."source_type" = 'Project' LEFT OUTER JOIN "project_repository_states" ON
"projects"."id" = "project_repository_states"."project_id" WHERE "project_repository_states"."project_id" IS NULL AND "projects"."repository_storage" = 'nfs-file05' LIMIT 1000
Which leads to this:
gitlabhq_production=# explain analyze SELECT "projects"."id" FROM "projects" LEFT OUTER JOIN "routes" ON "routes"."source_id" = "projects"."id" AND "routes"."source_type" = 'Project' LEFT OUTER JOIN "project_repository_states" ON
gitlabhq_production-# "projects"."id" = "project_repository_states"."project_id" WHERE "project_repository_states"."project_id" IS NULL AND "projects"."repository_storage" = 'nfs-file03' LIMIT 1000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.99..295068.83 rows=8 width=4) (actual time=2286.265..6698.175 rows=9 loops=1)
-> Merge Anti Join (cost=0.99..295068.83 rows=8 width=4) (actual time=2286.265..6698.158 rows=9 loops=1)
Merge Cond: (projects.id = project_repository_states.project_id)
-> Index Only Scan using idx_projects_on_repository_storage_last_repository_updated_at on projects (cost=0.56..154803.87 rows=281930 width=4) (actual time=0.038..2089.042 rows=283253 loops=1)
Index Cond: (repository_storage = 'nfs-file03'::text)
Heap Fetches: 89489
-> Index Only Scan using index_project_repository_states_on_project_id on project_repository_states (cost=0.43..123297.25 rows=5377469 width=4) (actual time=0.038..3240.574 rows=3195152 loops=1)
Heap Fetches: 103476
Planning time: 5.205 ms
Execution time: 6698.259 ms
(10 rows)