Optimize Project related count with slack service
What does this MR do?
Optimize queries produced by:
- https://gitlab.com/gitlab-org/gitlab/-/blob/908902d1d4f6fbcd7780150c780837fd4fa8b301/ee/lib/ee/gitlab/usage_data.rb#L225
- https://gitlab.com/gitlab-org/gitlab/-/blob/908902d1d4f6fbcd7780150c780837fd4fa8b301/ee/lib/ee/gitlab/usage_data.rb#L226
- https://gitlab.com/gitlab-org/gitlab/-/blob/908902d1d4f6fbcd7780150c780837fd4fa8b301/ee/lib/ee/gitlab/usage_data.rb#L242
projects_slack_notifications_active: distinct_count(::Project.with_slack_service.where(time_period), :creator_id),
projects_slack_slash_active: distinct_count(::Project.with_slack_slash_commands_service.where(time_period), :creator_id),
protected_branches: distinct_count(::Project.with_protected_branches.where(time_period), :creator_id)
Note: the below will only focus on the SlackService, but should also ring true for handling the optimizing of the SlackSlashCommandsService as well.
query with time constraint and batching
SELECT
COUNT(DISTINCT "projects"."creator_id")
FROM
"projects"
INNER JOIN "services" ON "services"."project_id" = "projects"."id"
AND "services"."type" = 'SlackService'
WHERE
"projects"."created_at" BETWEEN '2020-02-07 17:49:18.737141'
AND '2020-03-06 17:49:18.737185'
AND "projects"."creator_id" >= 810000
AND "projects"."creator_id" < 811250;
query with no time constraint and batching
SELECT
COUNT(DISTINCT "projects"."creator_id")
FROM
"projects"
INNER JOIN "services" ON "services"."project_id" = "projects"."id"
AND "services"."type" = 'SlackService'
WHERE
"projects"."creator_id" >= 810000
AND "projects"."creator_id" < 811250;
explain data from database-lab
-
creator_idbetween810_000 AND 811_250
Before
- with time constraint - https://explain.depesz.com/s/6yws
- no time constraint - https://explain.depesz.com/s/MoHy
Observations
- filter applied when join between project and services was performed (no index)
Index Cond: (services.project_id = projects.id)
Filter: ((services.type)::text = 'SlackService'::text)
- combat this with adding index
CREATE INDEX index_services_on_type_and_project_id
ON services(project_id, type);
- explain now shows index only condition hit(improvement) - https://explain.depesz.com/s/0lWo
Index Cond: ((services.project_id = projects.id) AND (services.type = 'SlackService'::text))
- Filter applied between
projects.idandprojects.creator_id
Index Cond: (projects.id = services.project_id)
Filter: ((projects.creator_id >= 1) AND (projects.creator_id < 100000))
- combat this with adding index
CREATE INDEX index_projects_on_creator_id_and_created_at
ON projects(creator_id, created_at);
- explain now shows index only condition hit(improvement)
- time constraint - https://explain.depesz.com/s/ORiI
- no time constraint - https://explain.depesz.com/s/JGOM
Index Cond: ((projects.creator_id >= 810000) AND (projects.creator_id < 811250))
- MAX/MIN calculations
query
SELECT
MAX("projects"."creator_id")
FROM
"projects"
WHERE
"projects"."created_at" BETWEEN '2020-02-07 06:22:09.650617'
AND '2020-03-06 06:22:09.650886'
- Explain results(within thresholds:
1.925 ms) - https://explain.depesz.com/s/hH2W
Plan
- Add these indexes
CREATE INDEX index_services_on_type_and_project_id
ON services(project_id, type);
CREATE INDEX index_projects_on_creator_id_and_created_at
ON projects(creator_id, created_at);
After only index conditions are hit
- with time constraint - https://explain.depesz.com/s/ORiI
- no time constraint - https://explain.depesz.com/s/JGOM
Timing
After the index for batch counting takes 1_372 seconds pessimistic
- 5.5 million users,
- with 1_250 batch sizes
- 5.5M/1_250 = 4_400 loops
- Time: < 312 ms ( cold cache with no time constraint )
Migration output
10:08 $ rails db:migrate:up VERSION=20200306192548
== 20200306192548 AddIndexOnProjectIdAndTypeToServices: migrating =============
-- transaction_open?()
-> 0.0000s
-- index_exists?(:services, [:project_id, :type], {:algorithm=>:concurrently})
-> 0.0032s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- add_index(:services, [:project_id, :type], {:algorithm=>:concurrently})
-> 0.0048s
-- execute("RESET ALL")
-> 0.0004s
-- transaction_open?()
-> 0.0000s
-- indexes(:services)
-> 0.0019s
-- execute("SET statement_timeout TO 0")
-> 0.0004s
-- remove_index(:services, {:algorithm=>:concurrently, :name=>"index_services_on_project_id"})
-> 0.0016s
-- execute("RESET ALL")
-> 0.0004s
== 20200306192548 AddIndexOnProjectIdAndTypeToServices: migrated (0.0135s) ====
✔ ~/projects/gdk/gitlab [208887-optimize-project-counters-with-slack-service ↓·1↑·359|●1✚ 3⚑ 1]
11:29 $ rails db:migrate:down VERSION=20200306192548
== 20200306192548 AddIndexOnProjectIdAndTypeToServices: reverting =============
-- transaction_open?()
-> 0.0000s
-- index_exists?(:services, :project_id, {:name=>"index_services_on_project_id", :algorithm=>:concurrently})
-> 0.0033s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- add_index(:services, :project_id, {:name=>"index_services_on_project_id", :algorithm=>:concurrently})
-> 0.0055s
-- execute("RESET ALL")
-> 0.0004s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:services, [:project_id, :type], {:algorithm=>:concurrently})
-> 0.0018s
-- execute("SET statement_timeout TO 0")
-> 0.0004s
-- remove_index(:services, {:algorithm=>:concurrently, :column=>[:project_id, :type]})
-> 0.0034s
-- execute("RESET ALL")
-> 0.0004s
== 20200306192548 AddIndexOnProjectIdAndTypeToServices: reverted (0.0159s) ====
10:01 $ rails db:migrate:up VERSION=20200306193236
== 20200306193236 AddIndexOnCreatorIdAndCreatedAtToProjects: migrating ========
-- transaction_open?()
-> 0.0000s
-- index_exists?(:projects, [:creator_id, :created_at], {:algorithm=>:concurrently})
-> 0.0118s
-- execute("SET statement_timeout TO 0")
-> 0.0004s
-- add_index(:projects, [:creator_id, :created_at], {:algorithm=>:concurrently})
-> 0.0045s
-- execute("RESET ALL")
-> 0.0004s
-- transaction_open?()
-> 0.0000s
-- indexes(:projects)
-> 0.0109s
-- execute("SET statement_timeout TO 0")
-> 0.0004s
-- remove_index(:projects, {:algorithm=>:concurrently, :name=>"index_projects_on_creator_id"})
-> 0.0023s
-- execute("RESET ALL")
-> 0.0004s
== 20200306193236 AddIndexOnCreatorIdAndCreatedAtToProjects: migrated (0.0316s)
✔ ~/projects/gdk/gitlab [208887-optimize-project-counters-with-slack-service ↓·1↑·359|●1✚ 2⚑ 1]
12:10 $ rails db:migrate:down VERSION=20200306193236
== 20200306193236 AddIndexOnCreatorIdAndCreatedAtToProjects: reverting ========
-- transaction_open?()
-> 0.0000s
-- index_exists?(:projects, :creator_id, {:name=>"index_projects_on_creator_id", :algorithm=>:concurrently})
-> 0.0117s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- add_index(:projects, :creator_id, {:name=>"index_projects_on_creator_id", :algorithm=>:concurrently})
-> 0.0044s
-- execute("RESET ALL")
-> 0.0004s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:projects, [:creator_id, :created_at], {:algorithm=>:concurrently})
-> 0.0118s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- remove_index(:projects, {:algorithm=>:concurrently, :column=>[:creator_id, :created_at]})
-> 0.0121s
-- execute("RESET ALL")
-> 0.0004s
== 20200306193236 AddIndexOnCreatorIdAndCreatedAtToProjects: reverted (0.0422s)
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec -
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Related to #208887 (closed)
Edited by Doug Stull