Skip to content

`project_authorizations` table may need an index for project ID

I was benchmarking the load times for gitlab-com/www-gitlab-com!2497 (merged) and noticed the call to raw_participants was slow. In the SQL query log, you can see 6 slow queries:

D, [2017-01-30T04:53:48.865179 #41983] DEBUG -- :   User Load (739.3ms)  SELECT "users".* FROM "users" INNER JOIN "project_authorizations" ON "users"."id" = "project_authorizations"."user_id" WHERE "project_authorizations"."project_id" = $1  ORDER BY "users"."id" DESC  [["project_id", 554859]]
D, [2017-01-30T04:53:49.522571 #41983] DEBUG -- :   User Load (646.9ms)  SELECT "users".* FROM "users" INNER JOIN "project_authorizations" ON "users"."id" = "project_authorizations"."user_id" WHERE "project_authorizations"."project_id" = $1  ORDER BY "users"."id" DESC  [["project_id", 278964]]
D, [2017-01-30T04:53:50.091217 #41983] DEBUG -- :   User Load (557.8ms)  SELECT "users".* FROM "users" INNER JOIN "project_authorizations" ON "users"."id" = "project_authorizations"."user_id" WHERE "project_authorizations"."project_id" = $1  ORDER BY "users"."id" DESC  [["project_id", 250833]]
D, [2017-01-30T04:53:50.677502 #41983] DEBUG -- :   User Load (574.4ms)  SELECT "users".* FROM "users" INNER JOIN "project_authorizations" ON "users"."id" = "project_authorizations"."user_id" WHERE "project_authorizations"."project_id" = $1  ORDER BY "users"."id" DESC  [["project_id", 13083]]
D, [2017-01-30T04:53:51.234828 #41983] DEBUG -- :   User Load (546.0ms)  SELECT "users".* FROM "users" INNER JOIN "project_authorizations" ON "users"."id" = "project_authorizations"."user_id" WHERE "project_authorizations"."project_id" = $1  ORDER BY "users"."id" DESC  [["project_id", 313079]]
D, [2017-01-30T04:53:51.789168 #41983] DEBUG -- :   User Load (551.4ms)  SELECT "users".* FROM "users" INNER JOIN "project_authorizations" ON "users"."id" = "project_authorizations"."user_id" WHERE "project_authorizations"."project_id" = $1  ORDER BY "users"."id" DESC  [["project_id", 7764]]

Note that the MR calls @all, which causes all the team members to be loaded: https://gitlab.com/gitlab-org/gitlab-ce/blob/v8.16.3/lib/banzai/reference_parser/user_parser.rb#L102-103

When I ran EXPLAIN ANALYZE, you can see why an index to the project ID would be helpful:

gitlabhq_production=> EXPLAIN ANALYZE SELECT * from users INNER JOIN "project_authorizations" ON "users"."id" = "project_authorizations"."user_id" WHERE "project_authorizations"."project_id" = 554859 ORDER BY "users"."id" DESC;

                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=92545.96..92546.02 rows=24 width=721) (actual time=585.958..586.203 rows=101 loops=1)
   Sort Key: users.id DESC
   Sort Method: quicksort  Memory: 188kB
   ->  Nested Loop  (cost=0.42..92545.41 rows=24 width=721) (actual time=4.877..585.331 rows=101 loops=1)
         ->  Seq Scan on project_authorizations  (cost=0.00..92342.55 rows=24 width=12) (actual time=4.859..583.635 rows=101 loops=1)
               Filter: (project_id = 554859)
               Rows Removed by Filter: 4732250  <---- OUCH
         ->  Index Scan using users_pkey on users  (cost=0.42..8.44 rows=1 width=705) (actual time=0.011..0.012 rows=1 loops=101)
               Index Cond: (id = project_authorizations.user_id)
 Planning time: 0.485 ms
 Execution time: 586.647 ms
(11 rows)

Notice the number of rows removed by the filter. Creating an index to the project ID would have helped in this case.

Thoughts, @ahmadsherif and @yorickpeterse?