Skip to content

BG migration for MR metrics target_project_id

What does this MR do?

This MR adds target_project_id column to merge_request_metrics and keeps it in sync with merge_requests.target_project_id.

The goal of this denormalization is to get better query timings when querying the merged_at column in merge_requests_metrics table. Currently when querying merged_at, the DB needs to process all MR records and join the metrics table for the given project/projects.

Efficient queries on MRs and the merged_at column will be required for MR analytics (throughput) feature: #229045 (closed)

I tested the import / export feature. Importing an older GL export is working, the target_project_id will be set automatically by the before_save callback on MergeRequest::Metrics. I didn't notice N+1 query on merge_requests (finding the target_project_id), probably because we have the inverse_of set up.

Inefficient query:

SELECT count("merge_requests".id)
FROM "merge_requests"
INNER JOIN merge_request_metrics ON merge_request_metrics.merge_request_id=merge_requests.id AND (merged_at > '2020-05-01 00:00') AND (merged_at < '2020-05-31 23:59')
WHERE merge_requests.target_project_id IN (278964, 7764) 

ERROR: canceling statement due to statement timeout

New query:

SELECT count("merge_request_metrics".merge_request_id)
FROM "merge_requests"
INNER JOIN merge_request_metrics ON merge_request_metrics.merge_request_id=merge_requests.id AND (merged_at > '2020-05-01 00:00') AND (merged_at < '2020-05-31 23:59')
WHERE merge_request_metrics.target_project_id IN (278964, 7764) 

Note: There is an index on merged_at and target_project_id, I'll add it in a follow up issue where the query is going to be implemented.

Joining the merge_requests table will be required if we need extra filtering on the data (assignee, author, labels, etc.).

Migrations

Up

== 20200723125205 AddTargetProjectIdToMrMetrics: migrating ====================
-- add_column(:merge_request_metrics, :target_project_id, :integer)
   -> 0.0008s
== 20200723125205 AddTargetProjectIdToMrMetrics: migrated (0.0045s) ===========

== 20200723128332 AddFkToMetricsTargetProjectId: migrating ====================
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:merge_request_metrics, :target_project_id, {:algorithm=>:concurrently})
   -> 0.0031s
-- add_index(:merge_request_metrics, :target_project_id, {:algorithm=>:concurrently})
   -> 0.0051s
-- transaction_open?()
   -> 0.0000s
-- foreign_keys(:merge_request_metrics)
   -> 0.0019s
-- execute("ALTER TABLE merge_request_metrics\nADD CONSTRAINT fk_56067dcb44\nFOREIGN KEY (target_project_id)\nREFERENCES projects (id)\nON DELETE CASCADE\nNOT VALID;\n")
   -> 0.0023s
-- execute("ALTER TABLE merge_request_metrics VALIDATE CONSTRAINT fk_56067dcb44;")
   -> 0.0026s
== 20200723128332 AddFkToMetricsTargetProjectId: migrated (0.0177s) ===========

== 20200723132258 ScheduleCopyOfMrTargetProjectIdToMrMetrics: migrating =======
== 20200723132258 ScheduleCopyOfMrTargetProjectIdToMrMetrics: migrated (0.0450s)

Down

== 20200723132258 ScheduleCopyOfMrTargetProjectIdToMrMetrics: reverting =======
== 20200723132258 ScheduleCopyOfMrTargetProjectIdToMrMetrics: reverted (0.0000s)

== 20200723128332 AddFkToMetricsTargetProjectId: reverting ====================
-- remove_foreign_key(:merge_request_metrics, {:column=>:target_project_id})
   -> 0.0059s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:merge_request_metrics, :target_project_id, {:algorithm=>:concurrently})
   -> 0.0038s
-- remove_index(:merge_request_metrics, {:algorithm=>:concurrently, :column=>:target_project_id})
   -> 0.0048s
== 20200723128332 AddFkToMetricsTargetProjectId: reverted (0.0148s) ===========

== 20200723125205 AddTargetProjectIdToMrMetrics: reverting ====================
-- remove_column(:merge_request_metrics, :target_project_id, :integer)
   -> 0.0013s
== 20200723125205 AddTargetProjectIdToMrMetrics: reverted (0.0039s) ===========

BG migration

48M records, 5K batches. Estimated runtime: about 13 days

WITH merge_requests_batch AS
  (SELECT id,
          target_project_id
   FROM merge_requests
   WHERE id BETWEEN 27079 AND 33101)
UPDATE merge_request_metrics
SET target_project_id = merge_requests_batch.target_project_id
FROM merge_requests_batch
WHERE merge_request_metrics.merge_request_id=merge_requests_batch.id

Plan

The plan is from DB lab, most of the time is spent in IO.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Adam Hegyi

Merge request reports