`project_authorizations` table may need an index for project ID
I was benchmarking the load times for gitlab-com/www-gitlab-com!2497 (merged) and noticed the call to raw_participants was slow. In the SQL query log, you can see 6 slow queries:
D, [2017-01-30T04:53:48.865179 #41983] DEBUG -- : User Load (739.3ms) SELECT "users".* FROM "users" INNER JOIN "project_authorizations" ON "users"."id" = "project_authorizations"."user_id" WHERE "project_authorizations"."project_id" = $1 ORDER BY "users"."id" DESC [["project_id", 554859]]
D, [2017-01-30T04:53:49.522571 #41983] DEBUG -- : User Load (646.9ms) SELECT "users".* FROM "users" INNER JOIN "project_authorizations" ON "users"."id" = "project_authorizations"."user_id" WHERE "project_authorizations"."project_id" = $1 ORDER BY "users"."id" DESC [["project_id", 278964]]
D, [2017-01-30T04:53:50.091217 #41983] DEBUG -- : User Load (557.8ms) SELECT "users".* FROM "users" INNER JOIN "project_authorizations" ON "users"."id" = "project_authorizations"."user_id" WHERE "project_authorizations"."project_id" = $1 ORDER BY "users"."id" DESC [["project_id", 250833]]
D, [2017-01-30T04:53:50.677502 #41983] DEBUG -- : User Load (574.4ms) SELECT "users".* FROM "users" INNER JOIN "project_authorizations" ON "users"."id" = "project_authorizations"."user_id" WHERE "project_authorizations"."project_id" = $1 ORDER BY "users"."id" DESC [["project_id", 13083]]
D, [2017-01-30T04:53:51.234828 #41983] DEBUG -- : User Load (546.0ms) SELECT "users".* FROM "users" INNER JOIN "project_authorizations" ON "users"."id" = "project_authorizations"."user_id" WHERE "project_authorizations"."project_id" = $1 ORDER BY "users"."id" DESC [["project_id", 313079]]
D, [2017-01-30T04:53:51.789168 #41983] DEBUG -- : User Load (551.4ms) SELECT "users".* FROM "users" INNER JOIN "project_authorizations" ON "users"."id" = "project_authorizations"."user_id" WHERE "project_authorizations"."project_id" = $1 ORDER BY "users"."id" DESC [["project_id", 7764]]
Note that the MR calls @all, which causes all the team members to be loaded: https://gitlab.com/gitlab-org/gitlab-ce/blob/v8.16.3/lib/banzai/reference_parser/user_parser.rb#L102-103
When I ran EXPLAIN ANALYZE, you can see why an index to the project ID would be helpful:
gitlabhq_production=> EXPLAIN ANALYZE SELECT * from users INNER JOIN "project_authorizations" ON "users"."id" = "project_authorizations"."user_id" WHERE "project_authorizations"."project_id" = 554859 ORDER BY "users"."id" DESC;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=92545.96..92546.02 rows=24 width=721) (actual time=585.958..586.203 rows=101 loops=1)
Sort Key: users.id DESC
Sort Method: quicksort Memory: 188kB
-> Nested Loop (cost=0.42..92545.41 rows=24 width=721) (actual time=4.877..585.331 rows=101 loops=1)
-> Seq Scan on project_authorizations (cost=0.00..92342.55 rows=24 width=12) (actual time=4.859..583.635 rows=101 loops=1)
Filter: (project_id = 554859)
Rows Removed by Filter: 4732250 <---- OUCH
-> Index Scan using users_pkey on users (cost=0.42..8.44 rows=1 width=705) (actual time=0.011..0.012 rows=1 loops=101)
Index Cond: (id = project_authorizations.user_id)
Planning time: 0.485 ms
Execution time: 586.647 ms
(11 rows)
Notice the number of rows removed by the filter. Creating an index to the project ID would have helped in this case.
Thoughts, @ahmadsherif and @yorickpeterse?