Skip to content

Add partial indexes on todos to handle users with many todos

This adds a pair of partial indexes on todos on columns user_id, id, one for state=done and one for state=pending. This solves the performance problem Sean was complaining about in https://gitlab.com/gitlab-org/gitlab-ce/issues/44078

It has the unfortunate side effect of making updates to the state field on todos non-HOT updates which may significantly increase the amount of vacuuming this table needs.

Database Checklist

When adding migrations:

  • Updated db/schema.rb
  • Added a down method so the migration can be reverted
  • Added the output of the migration(s) to the MR body
  • Added tests for the migration in spec/migrations if necessary (e.g. when migrating data)

When adding or modifying queries to improve performance:

  • Included data that shows the performance improvement, preferably in the form of a benchmark
  • Included the output of EXPLAIN (ANALYZE, BUFFERS) of the relevant queries

When adding foreign keys to existing tables:

  • Included a migration to remove orphaned rows in the source table before adding the foreign key
  • Removed any instances of dependent: ... that may no longer be necessary

When adding tables:

  • Ordered columns based on the Ordering Table Columns guidelines
  • Added foreign keys to any columns pointing to data in other tables
  • Added indexes for fields that are used in statements such as WHERE, ORDER BY, GROUP BY, and JOINs

When removing columns, tables, indexes or other structures:

  • Removed these in a post-deployment migration
  • Made sure the application no longer uses (or ignores) these structures

General Checklist

Migration:

stark@tweedle:~/gitlab/gdk/gitlab-development-kit/gitlab$ bundle exec rake db:migrate
WARNING: This version of GitLab depends on gitlab-shell 6.0.4, but you're running 6.0.3. Please update gitlab-shell.
== 20180309160427 AddPartialIndexesOnTodos: migrating =========================
-- index_exists?(:todos, [:user_id, :id], {:name=>"index_todos_on_user_id_and_id_pending"})
   -> 0.0047s
-- transaction_open?()
   -> 0.0000s
-- execute("SET statement_timeout TO 0")
   -> 0.0003s
-- add_index(:todos, [:user_id, :id], {:where=>"state='pending'", :name=>"index_todos_on_user_id_and_id_pending", :algorithm=>:concurrently})
   -> 0.3097s
-- index_exists?(:todos, [:user_id, :id], {:name=>"index_todos_on_user_id_and_id_done"})
   -> 0.0063s
-- transaction_open?()
   -> 0.0000s
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- add_index(:todos, [:user_id, :id], {:where=>"state='done'", :name=>"index_todos_on_user_id_and_id_done", :algorithm=>:concurrently})
   -> 0.0184s
== 20180309160427 AddPartialIndexesOnTodos: migrated (0.3407s) ================

Rollback:

stark@tweedle:~/gitlab/gdk/gitlab-development-kit/gitlab$ bundle exec rake db:rollback
WARNING: This version of GitLab depends on gitlab-shell 6.0.4, but you're running 6.0.3. Please update gitlab-shell.
== 20180309160427 AddPartialIndexesOnTodos: reverting =========================
-- transaction_open?()
   -> 0.0000s
-- select_one("SELECT current_setting('server_version_num') AS v")
   -> 0.0008s
-- execute("SET statement_timeout TO 0")
   -> 0.0003s
-- remove_index(:todos, {:where=>"state='pending'", :name=>"index_todos_on_user_id_and_id_pending", :algorithm=>:concurrently, :column=>[:user_id, :id]})
   -> 0.0140s
-- transaction_open?()
   -> 0.0000s
-- select_one("SELECT current_setting('server_version_num') AS v")
   -> 0.0007s
-- execute("SET statement_timeout TO 0")
   -> 0.0003s
-- remove_index(:todos, {:where=>"state='done'", :name=>"index_todos_on_user_id_and_id_done", :algorithm=>:concurrently, :column=>[:user_id, :id]})
   -> 0.0055s
== 20180309160427 AddPartialIndexesOnTodos: reverted (0.0224s) ================

The query we're having trouble with plan before:

gitlabhq_production=# explain analyze SELECT "todos".* FROM "todos" INNER JOIN "projects" ON "projects"."id" = "todos"."project_id" WHERE "todos"."user_id" = 443319 AND ("todos"."state" IN ('pending')) AND (EXISTS (SELECT 1 FROM "project_authorizations" WHERE "project_authorizations"."user_id" = 443319 AND (project_authorizations.project_id = projects.id)) OR projects.visibility_level IN (10,20)) ORDER BY "todos"."id" DESC LIMIT 20 OFFSET 0;
                                                                                                    QUERY PLAN                                                                                                    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.86..3527.40 rows=20 width=99) (actual time=0.281..6767.368 rows=13 loops=1)
   ->  Nested Loop  (cost=0.86..345601.08 rows=1960 width=99) (actual time=0.280..6767.360 rows=13 loops=1)
         ->  Index Scan Backward using todos_pkey on todos  (cost=0.43..321108.38 rows=3530 width=99) (actual time=0.241..6766.991 rows=13 loops=1)
               Filter: ((user_id = 443319) AND ((state)::text = 'pending'::text))
               Rows Removed by Filter: 8281639
         ->  Index Scan using projects_pkey on projects  (cost=0.43..6.93 rows=1 width=4) (actual time=0.022..0.022 rows=1 loops=13)
               Index Cond: (id = todos.project_id)
               Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (visibility_level = ANY ('{10,20}'::integer[])))
               SubPlan 1
                 ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations  (cost=0.56..3.58 rows=1 width=0) (actual time=0.010..0.010 rows=1 loops=13)
                       Index Cond: ((user_id = 443319) AND (project_id = projects.id))
                       Heap Fetches: 0
               SubPlan 2
                 ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations project_authorizations_1  (cost=0.56..9.06 rows=94 width=4) (never executed)
                       Index Cond: (user_id = 443319)
                       Heap Fetches: 0
 Planning time: 0.567 ms
 Execution time: 6767.479 ms
(18 rows)

And after:

gitlabhq_production=# explain analyze SELECT "todos".* FROM "todos" INNER JOIN "projects" ON "projects"."id" = "todos"."project_id" WHERE "todos"."user_id" = 443319 AND ("todos"."state" IN ('pending')) AND (EXISTS (SELECT 1 FROM "project_authorizations" WHERE "project_authorizations"."user_id" = 443319 AND (project_authorizations.project_id = projects.id)) OR projects.visibility_level IN (10,20)) ORDER BY "todos"."id" DESC LIMIT 20 OFFSET 0;
                                                                                                    QUERY PLAN                                                                                                    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.86..382.17 rows=20 width=99) (actual time=0.032..0.072 rows=6 loops=1)
   ->  Nested Loop  (cost=0.86..26502.21 rows=1390 width=99) (actual time=0.030..0.069 rows=6 loops=1)
         ->  Index Scan Backward using index_todos_on_user_id_and_id_pending on todos  (cost=0.43..4532.37 rows=2496 width=99) (actual time=0.011..0.017 rows=6 loops=1)
               Index Cond: (user_id = 443319)
         ->  Index Scan using projects_pkey on projects  (cost=0.43..8.79 rows=1 width=4) (actual time=0.008..0.008 rows=1 loops=6)
               Index Cond: (id = todos.project_id)
               Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (visibility_level = ANY ('{10,20}'::integer[])))
               SubPlan 1
                 ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations  (cost=0.43..4.45 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=6)
                       Index Cond: ((user_id = 443319) AND (project_id = projects.id))
                       Heap Fetches: 0
               SubPlan 2
                 ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations project_authorizations_1  (cost=0.43..5.80 rows=78 width=4) (never executed)
                       Index Cond: (user_id = 443319)
                       Heap Fetches: 0
 Planning time: 0.603 ms
 Execution time: 0.120 ms
(17 rows)
Edited by Yorick Peterse

Merge request reports