Skip to content

Add index on events table on project_id, created_at to support Group::AnalyticsController

Gregory Stark requested to merge index-for-groups-analytic-controller-4899 into master

Group::AnalyticsController does a bunch of queries for recent events on a set of projects. For each one we were previously doing a nested loop to retrieve all events for those projects but we only need the last week so for long-lived projects it was very inefficient. Add an index on project_id,created_at to allow fetching only the recent events. It's still an unnecessarily large number of queries each of which does a nested loop from many projects which can have many events but it looks like this will be quite a bit faster, at least for projects older than a month or two.

Note that Group::AnalyticsController is EE-only but I don't see any compelling reason to have the schema diverge here.

stark@tweedle:~/gitlab/gdk/gitlab-development-kit/gitlab$ rake db:migrate
== 20180212121259 IndexForGroupsAnalyticController4899: migrating =============
-- transaction_open?()
   -> 0.0000s
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- add_index(:events, [:project_id, :created_at], {:algorithm=>:concurrently})
   -> 0.2084s
== 20180212121259 IndexForGroupsAnalyticController4899: migrated (0.2091s) ====
== 20180212121259 IndexForGroupsAnalyticController4899: reverting =============
-- transaction_open?()
   -> 0.0000s
-- select_one("SELECT current_setting('server_version_num') AS v")
   -> 0.0008s
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- remove_index(:events, {:algorithm=>:concurrently, :column=>[:project_id, :created_at]})
   -> 0.0156s
== 20180212121259 IndexForGroupsAnalyticController4899: reverted (0.0172s) ====

The new plan:

gitlabhq_production=# explain (analyze,buffers) SELECT COUNT(*) AS count_all, "events"."author_id" AS events_author_id FROM "events" WHERE (action = 5 OR (target_type IN ('MergeRequest','Issue') AND action IN (1,3,7)) OR (target_type = 'Note' AND action = 6)) AND (created_at > '2018-02-05') AND "events"."project_id" IN (SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" = 9970) AND "events"."target_type" = 'Issue' AND "events"."action" = 3 GROUP BY "events"."author_id";
                                                                                                                                   QUERY PLAN                                                                                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=121.69..121.71 rows=1 width=12) (actual time=0.740..0.740 rows=0 loops=1)
   Group Key: events.author_id
   Buffers: shared hit=560
   ->  Sort  (cost=121.69..121.69 rows=1 width=4) (actual time=0.740..0.740 rows=0 loops=1)
         Sort Key: events.author_id
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=560
         ->  Nested Loop  (cost=1.00..121.68 rows=1 width=4) (actual time=0.729..0.729 rows=0 loops=1)
               Buffers: shared hit=560
               ->  Index Scan using index_projects_on_namespace_id on projects  (cost=0.43..38.59 rows=18 width=4) (actual time=0.013..0.137 rows=113 loops=1)
                     Index Cond: (namespace_id = 9970)
                     Buffers: shared hit=103
               ->  Index Scan using stark_tmp_issue_4899 on events  (cost=0.57..4.61 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=113)
                     Index Cond: ((project_id = projects.id) AND (created_at > '2018-02-05 00:00:00+00'::timestamp with time zone))
                     Filter: (((target_type)::text = 'Issue'::text) AND (action = 3) AND ((action = 5) OR (((target_type)::text = ANY ('{MergeRequest,Issue}'::text[])) AND (action = ANY ('{1,3,7}'::integer[]))) OR (((target_type)::text = 'Note'::text) AND (action = 6))))
                     Rows Removed by Filter: 0
                     Buffers: shared hit=457
 Planning time: 1.238 ms
 Execution time: 0.792 ms
(19 rows)

The old plan was (note the different "Index Cond" on index_events_on_project_id_and_id lacking the created_at column:

gitlabhq_production=# explain SELECT COUNT(*) AS count_all, "events"."author_id" AS events_author_id FROM "events" WHERE (action = 5 OR (target_type IN ('MergeRequest','Issue') AND action IN (1,3,7)) OR (target_type = 'Note' AND action = 6)) AND (created_at > '2018-02-05') AND "events"."project_id" IN (SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" = 9970) AND "events"."target_type" = 'Issue' AND "events"."action" = 3 GROUP BY "events"."author_id";
                                                                                                                                                                      QUERY PLAN                                                                                                                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=39008.12..39008.14 rows=1 width=12)
   Group Key: events.author_id
   ->  Sort  (cost=39008.12..39008.13 rows=1 width=4)
         Sort Key: events.author_id
         ->  Nested Loop  (cost=1.00..39008.11 rows=1 width=4)
               ->  Index Scan using index_projects_on_namespace_id on projects  (cost=0.43..30.34 rows=19 width=4)
                     Index Cond: (namespace_id = 9970)
               ->  Index Scan using index_events_on_project_id_and_id on events  (cost=0.57..2051.45 rows=1 width=8)
                     Index Cond: (project_id = projects.id)
                     Filter: ((created_at > '2018-02-05 00:00:00+00'::timestamp with time zone) AND ((target_type)::text = 'Issue'::text) AND (action = 3) AND ((action = 5) OR (((target_type)::text = ANY ('{MergeRequest,Issue}'::text[])) AND (action = ANY ('{1,3,7}'::integer[]))) OR (((target_type)::text = 'Note'::text) AND (action = 6))))
(10 rows)

Database Checklist

When adding migrations:

  • Updated db/schema.rb
  • Added a down method so the migration can be reverted
  • Added the output of the migration(s) to the MR body
  • Added tests for the migration in spec/migrations if necessary (e.g. when migrating data)

When adding or modifying queries to improve performance:

  • Included data that shows the performance improvement, preferably in the form of a benchmark
  • Included the output of EXPLAIN (ANALYZE, BUFFERS) of the relevant queries

When adding foreign keys to existing tables:

  • Included a migration to remove orphaned rows in the source table before adding the foreign key
  • Removed any instances of dependent: ... that may no longer be necessary

When adding tables:

  • Ordered columns based on the Ordering Table Columns guidelines
  • Added foreign keys to any columns pointing to data in other tables
  • Added indexes for fields that are used in statements such as WHERE, ORDER BY, GROUP BY, and JOINs

When removing columns, tables, indexes or other structures:

  • Removed these in a post-deployment migration
  • Made sure the application no longer uses (or ignores) these structures

General Checklist

Edited by Gregory Stark

Merge request reports