Optimize project counters with repositories enabled counter
What does this MR do?
Optimize query produced by https://gitlab.com/gitlab-org/gitlab/-/blob/908902d1d4f6fbcd7780150c780837fd4fa8b301/ee/lib/ee/gitlab/usage_data.rb#L241
projects_with_repositories_enabled: distinct_count(::Project.with_repositories_enabled.where(time_period), :creator_id),
query with time constraint and batching
SELECT
COUNT(DISTINCT "projects"."creator_id")
FROM
"projects"
INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
WHERE
"project_features"."repository_access_level" = 20
AND "projects"."created_at" BETWEEN '2020-02-07 21:17:04.618237'
AND '2020-03-06 21:17:04.618286'
AND "projects"."creator_id" >= 810000
AND "projects"."creator_id" < 820000;
query with no time constraint and batching
SELECT
COUNT(DISTINCT "projects"."creator_id")
FROM
"projects"
INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
WHERE
"project_features"."repository_access_level" = 20
AND "projects"."creator_id" >= 810000
AND "projects"."creator_id" < 820000;
explain data from database-lab
-
id
between1 AND 10_000
andcreator_id
between1 AND 100_000
Before
- with time constraint - https://explain.depesz.com/s/V5dT
- with no time constraint - https://explain.depesz.com/s/e4fO
Observations
- filter applied when join performed between
projects
andproject_features
for theproject_id
andrepository_access_level
Index Cond: (project_features.project_id = projects.id)
Filter: (project_features.repository_access_level = 20)
- combat this with adding index
CREATE INDEX index_project_features_on_project_id_and_repository_access_level_20
ON project_features(project_id)
where project_features.repository_access_level = 20;
- explain now shows index only condition hit(improvement) - https://explain.depesz.com/s/obCh
Index Cond: (project_features.project_id = projects.id)
- MAX/MIN calculations
- Query -
SELECT MAX("projects"."creator_id") FROM "projects" WHERE "projects"."created_at" BETWEEN '2020-02-07 06:22:09.650617' AND '2020-03-06 06:22:09.650886'
- Explain results(no index, but within thresholds) - https://explain.depesz.com/s/SDPC
Plan
- Add this indexes
CREATE INDEX index_project_features_on_project_id_and_repository_access_level_20
ON project_features(project_id, repository_access_level)
where project_features.repository_access_level = 20;
After only index conditions are hit
- with time constraint - https://explain.depesz.com/s/kD9P
- with no time constraint - https://explain.depesz.com/s/obCh
Timing
After the index for batch counting takes 55 seconds pessimistic
- 5.5 million users,
- with 1_250 batch sizes
- 5.5M/1_250 = 4_400 loops
- Time: < 1.4s ( cold cache )
Migration output
10:10 $ VERBOSE=true be rake db:migrate:up VERSION=20200309140540
== 20200309140540 AddIndexOnProjectIdAndRepositoryAccessLevelToProjectFeatures: migrating
-- transaction_open?()
-> 0.0000s
-- index_exists?(:project_features, :project_id, {:where=>"repository_access_level = 20", :name=>"index_project_features_on_project_id_ral_20", :algorithm=>:concurrently})
-> 0.0026s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- add_index(:project_features, :project_id, {:where=>"repository_access_level = 20", :name=>"index_project_features_on_project_id_ral_20", :algorithm=>:concurrently})
-> 0.0037s
-- execute("RESET ALL")
-> 0.0004s
== 20200309140540 AddIndexOnProjectIdAndRepositoryAccessLevelToProjectFeatures: migrated (0.0074s)
✔ ~/projects/gdk/gitlab [208887-optimize-project-counters-projects_with_repositories_enabled ↑·129|✚ 1…1⚑ 1]
10:11 $ VERBOSE=true be rake db:migrate:down VERSION=20200309140540
== 20200309140540 AddIndexOnProjectIdAndRepositoryAccessLevelToProjectFeatures: reverting
-- transaction_open?()
-> 0.0000s
-- indexes(:project_features)
-> 0.0026s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- remove_index(:project_features, {:algorithm=>:concurrently, :name=>"index_project_features_on_project_id_ral_20"})
-> 0.0039s
-- execute("RESET ALL")
-> 0.0004s
== 20200309140540 AddIndexOnProjectIdAndRepositoryAccessLevelToProjectFeatures: reverted (0.0075s)
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
PartOf #208887 (closed)
Edited by Doug Stull