Skip to content

Optimize Project related count with slack service

What does this MR do?

Optimize queries produced by:

projects_slack_notifications_active: distinct_count(::Project.with_slack_service.where(time_period), :creator_id),
projects_slack_slash_active: distinct_count(::Project.with_slack_slash_commands_service.where(time_period), :creator_id),
protected_branches: distinct_count(::Project.with_protected_branches.where(time_period), :creator_id)

Note: the below will only focus on the SlackService, but should also ring true for handling the optimizing of the SlackSlashCommandsService as well.

query with time constraint and batching
SELECT
    COUNT(DISTINCT "projects"."creator_id")
FROM
    "projects"
    INNER JOIN "services" ON "services"."project_id" = "projects"."id"
        AND "services"."type" = 'SlackService'
WHERE
    "projects"."created_at" BETWEEN '2020-02-07 17:49:18.737141'
    AND '2020-03-06 17:49:18.737185'
    AND "projects"."creator_id" >= 810000
    AND "projects"."creator_id" < 811250;
query with no time constraint and batching
SELECT
    COUNT(DISTINCT "projects"."creator_id")
FROM
    "projects"
    INNER JOIN "services" ON "services"."project_id" = "projects"."id"
        AND "services"."type" = 'SlackService'
WHERE
    "projects"."creator_id" >= 810000
    AND "projects"."creator_id" < 811250;

explain data from database-lab

  • creator_id between 810_000 AND 811_250

Before

Observations

  1. filter applied when join between project and services was performed (no index)
Index Cond: (services.project_id = projects.id)
Filter: ((services.type)::text = 'SlackService'::text)
  • combat this with adding index
CREATE INDEX index_services_on_type_and_project_id
ON services(project_id, type);
Index Cond: ((services.project_id = projects.id) AND (services.type = 'SlackService'::text))
  1. Filter applied between projects.id and projects.creator_id
Index Cond: (projects.id = services.project_id)
Filter: ((projects.creator_id >= 1) AND (projects.creator_id < 100000))
  • combat this with adding index
CREATE INDEX index_projects_on_creator_id_and_created_at
ON projects(creator_id, created_at);
Index Cond: ((projects.creator_id >= 810000) AND (projects.creator_id < 811250))
  1. MAX/MIN calculations
query
SELECT
    MAX("projects"."creator_id")
FROM
    "projects"
WHERE
    "projects"."created_at" BETWEEN '2020-02-07 06:22:09.650617'
    AND '2020-03-06 06:22:09.650886' 

Plan

  • Add these indexes
CREATE INDEX index_services_on_type_and_project_id
ON services(project_id, type);

CREATE INDEX index_projects_on_creator_id_and_created_at
ON projects(creator_id, created_at);

After only index conditions are hit

Timing

After the index for batch counting takes 1_372 seconds pessimistic

  • 5.5 million users,
  • with 1_250 batch sizes
  • 5.5M/1_250 = 4_400 loops
  • Time: < 312 ms ( cold cache with no time constraint )

Migration output

10:08 $ rails db:migrate:up VERSION=20200306192548
== 20200306192548 AddIndexOnProjectIdAndTypeToServices: migrating =============
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:services, [:project_id, :type], {:algorithm=>:concurrently})
   -> 0.0032s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- add_index(:services, [:project_id, :type], {:algorithm=>:concurrently})
   -> 0.0048s
-- execute("RESET ALL")
   -> 0.0004s
-- transaction_open?()
   -> 0.0000s
-- indexes(:services)
   -> 0.0019s
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- remove_index(:services, {:algorithm=>:concurrently, :name=>"index_services_on_project_id"})
   -> 0.0016s
-- execute("RESET ALL")
   -> 0.0004s
== 20200306192548 AddIndexOnProjectIdAndTypeToServices: migrated (0.0135s) ====

✔ ~/projects/gdk/gitlab [208887-optimize-project-counters-with-slack-service ↓·1↑·359|●1✚ 3⚑ 1]
11:29 $ rails db:migrate:down VERSION=20200306192548
== 20200306192548 AddIndexOnProjectIdAndTypeToServices: reverting =============
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:services, :project_id, {:name=>"index_services_on_project_id", :algorithm=>:concurrently})
   -> 0.0033s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- add_index(:services, :project_id, {:name=>"index_services_on_project_id", :algorithm=>:concurrently})
   -> 0.0055s
-- execute("RESET ALL")
   -> 0.0004s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:services, [:project_id, :type], {:algorithm=>:concurrently})
   -> 0.0018s
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- remove_index(:services, {:algorithm=>:concurrently, :column=>[:project_id, :type]})
   -> 0.0034s
-- execute("RESET ALL")
   -> 0.0004s
== 20200306192548 AddIndexOnProjectIdAndTypeToServices: reverted (0.0159s) ====

10:01 $ rails db:migrate:up VERSION=20200306193236
== 20200306193236 AddIndexOnCreatorIdAndCreatedAtToProjects: migrating ========
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:projects, [:creator_id, :created_at], {:algorithm=>:concurrently})
   -> 0.0118s
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- add_index(:projects, [:creator_id, :created_at], {:algorithm=>:concurrently})
   -> 0.0045s
-- execute("RESET ALL")
   -> 0.0004s
-- transaction_open?()
   -> 0.0000s
-- indexes(:projects)
   -> 0.0109s
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- remove_index(:projects, {:algorithm=>:concurrently, :name=>"index_projects_on_creator_id"})
   -> 0.0023s
-- execute("RESET ALL")
   -> 0.0004s
== 20200306193236 AddIndexOnCreatorIdAndCreatedAtToProjects: migrated (0.0316s)

✔ ~/projects/gdk/gitlab [208887-optimize-project-counters-with-slack-service ↓·1↑·359|●1✚ 2⚑ 1]
12:10 $ rails db:migrate:down VERSION=20200306193236
== 20200306193236 AddIndexOnCreatorIdAndCreatedAtToProjects: reverting ========
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:projects, :creator_id, {:name=>"index_projects_on_creator_id", :algorithm=>:concurrently})
   -> 0.0117s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- add_index(:projects, :creator_id, {:name=>"index_projects_on_creator_id", :algorithm=>:concurrently})
   -> 0.0044s
-- execute("RESET ALL")
   -> 0.0004s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:projects, [:creator_id, :created_at], {:algorithm=>:concurrently})
   -> 0.0118s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- remove_index(:projects, {:algorithm=>:concurrently, :column=>[:creator_id, :created_at]})
   -> 0.0121s
-- execute("RESET ALL")
   -> 0.0004s
== 20200306193236 AddIndexOnCreatorIdAndCreatedAtToProjects: reverted (0.0422s)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Related to #208887 (closed)

/cc @a_akgun @alinamihaila @jeromezng

Edited by Doug Stull

Merge request reports

Loading