Improve query performance of merged_at filter
What does this MR do?
This MR significantly improve the query performance of productivity analytics and throughput analytics features. Previously we added target_project_id
to merge_request_metrics
(denormalization) and the BG migration (!37713 (merged)) that backfills the data is close to finish.
The change is happening in the MergeRequestsFinder
: If we join metrics
, add an extra condition: merge_requests.target_project_id = merge_request_metrics.target_project_id
Migration (index)
exec create index i1 on merge_request_metrics (target_project_id, merged_at)
Session: webui-i769
The query has been executed. Duration: 2.697 min
Up:
== 20200812112204 AddIndexToMrMetricsTargetProjectId: migrating ===============
-- transaction_open?()
-> 0.0000s
-- index_exists?(:merge_request_metrics, [:target_project_id, :merged_at], {:name=>"index_merge_request_metrics_on_target_project_id_merged_at", :algorithm=>:concurrently})
-> 0.0054s
-- add_index(:merge_request_metrics, [:target_project_id, :merged_at], {:name=>"index_merge_request_metrics_on_target_project_id_merged_at", :algorithm=>:concurrently})
-> 0.0178s
== 20200812112204 AddIndexToMrMetricsTargetProjectId: migrated (0.0238s) ======
Down:
== 20200812112204 AddIndexToMrMetricsTargetProjectId: reverting ===============
-- transaction_open?()
-> 0.0000s
-- indexes(:merge_request_metrics)
-> 0.0081s
-- remove_index(:merge_request_metrics, {:algorithm=>:concurrently, :name=>"index_merge_request_metrics_on_target_project_id_merged_at"})
-> 0.0021s
== 20200812112204 AddIndexToMrMetricsTargetProjectId: reverted (0.0107s) ======
Queries
MR (throughput) analytics
Before:
SELECT COUNT(*)
FROM "merge_requests"
INNER JOIN "merge_request_metrics" ON "merge_request_metrics"."merge_request_id" = "merge_requests"."id"
WHERE "merge_requests"."target_project_id" = 278964
AND "merge_request_metrics"."merged_at" >= '2019-06-30 23:00:00'
AND "merge_request_metrics"."merged_at" <= '2019-07-30 23:00:00'
After:
SELECT COUNT(*)
FROM "merge_requests"
INNER JOIN "merge_request_metrics" ON "merge_request_metrics"."merge_request_id" = "merge_requests"."id"
WHERE "merge_requests"."target_project_id" = 278964
AND merge_request_metrics.target_project_id = merge_requests.target_project_id
AND "merge_request_metrics"."merged_at" >= '2019-06-30 23:00:00'
AND "merge_request_metrics"."merged_at" <= '2019-07-30 23:00:00'
Productivity Analytics (group level)
Before:
SELECT DATE_PART('day', merge_request_metrics.first_comment_at - merge_requests.created_at)*24+DATE_PART('hour', merge_request_metrics.first_comment_at - merge_requests.created_at) AS metric,
count(*) AS mr_count
FROM "merge_request_metrics"
INNER JOIN "merge_requests" ON "merge_requests"."id" = "merge_request_metrics"."merge_request_id"
WHERE "merge_request_metrics"."merge_request_id" IN
(SELECT "merge_requests"."id"
FROM "merge_requests"
INNER JOIN "projects" ON "projects"."id" = "merge_requests"."target_project_id"
INNER JOIN "merge_request_metrics" ON "merge_request_metrics"."merge_request_id" = "merge_requests"."id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE "projects"."namespace_id" IN
(WITH RECURSIVE "base_and_descendants" AS (
(SELECT "namespaces".*
FROM "namespaces"
WHERE "namespaces"."type" = 'Group'
AND "namespaces"."id" = 9970)
UNION
(SELECT "namespaces".*
FROM "namespaces",
"base_and_descendants"
WHERE "namespaces"."type" = 'Group'
AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "namespaces"."id"
FROM "base_and_descendants" AS "namespaces")
AND (EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 2
AND (project_authorizations.project_id = projects.id))
OR projects.visibility_level IN (0,
10,
20))
AND ("project_features"."merge_requests_access_level" > 0
OR "project_features"."merge_requests_access_level" IS NULL)
AND ("merge_requests"."state_id" IN (3))
AND "merge_request_metrics"."merged_at" >= '2020-07-17 14:40:20.017161'
AND "merge_request_metrics"."merged_at" <= '2020-08-13 23:59:59'
ORDER BY "merge_requests"."updated_at" DESC, "merge_requests"."id" DESC)
GROUP BY DATE_PART('day', merge_request_metrics.first_comment_at - merge_requests.created_at)*24+DATE_PART('hour', merge_request_metrics.first_comment_at - merge_requests.created_at)
Note: the uncached query times out on PRD.
After:
SELECT DATE_PART('day', merge_request_metrics.first_comment_at - merge_requests.created_at)*24+DATE_PART('hour', merge_request_metrics.first_comment_at - merge_requests.created_at) AS metric,
count(*) AS mr_count
FROM "merge_request_metrics"
INNER JOIN "merge_requests" ON "merge_requests"."id" = "merge_request_metrics"."merge_request_id"
WHERE "merge_request_metrics"."merge_request_id" IN
(SELECT "merge_requests"."id"
FROM "merge_requests"
INNER JOIN "projects" ON "projects"."id" = "merge_requests"."target_project_id"
INNER JOIN "merge_request_metrics" ON "merge_request_metrics"."merge_request_id" = "merge_requests"."id"
LEFT JOIN project_features ON projects.id = project_features.project_id
WHERE "projects"."namespace_id" IN
(WITH RECURSIVE "base_and_descendants" AS (
(SELECT "namespaces".*
FROM "namespaces"
WHERE "namespaces"."type" = 'Group'
AND "namespaces"."id" = 9970)
UNION
(SELECT "namespaces".*
FROM "namespaces",
"base_and_descendants"
WHERE "namespaces"."type" = 'Group'
AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "namespaces"."id"
FROM "base_and_descendants" AS "namespaces")
AND (EXISTS
(SELECT 1
FROM "project_authorizations"
WHERE "project_authorizations"."user_id" = 2
AND (project_authorizations.project_id = projects.id))
OR projects.visibility_level IN (0,
10,
20))
AND ("project_features"."merge_requests_access_level" > 0
OR "project_features"."merge_requests_access_level" IS NULL)
AND ("merge_requests"."state_id" IN (3))
AND "merge_request_metrics"."merged_at" >= '2020-07-17 14:40:20.017161'
AND "merge_request_metrics"."merged_at" <= '2020-08-13 23:59:59'
AND "merge_requests"."target_project_id" = "merge_request_metrics"."target_project_id"
ORDER BY "merge_requests"."updated_at" DESC, "merge_requests"."id" DESC)
GROUP BY DATE_PART('day', merge_request_metrics.first_comment_at - merge_requests.created_at)*24+DATE_PART('hour', merge_request_metrics.first_comment_at - merge_requests.created_at)
Screenshots
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team