Deduplicating MR metrics table
What does this MR do?
Related issue: #214456 (closed)
This MR de-duplicates records in merge_request_metrics
table. Merge Request has one Merge Request Metrics record.
We have about 511 records on PRD and it's very unlikely that we'll have more on self-hosted.
There is a small chance to create duplicated merge_request_metrics
records when an MR is updated. Relevant code: https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/issuable.rb#L469
ensure_metrics
is triggered via the after_save
callback (even after update).
Possible cause:
When we import merge requests, we skip the callbacks to speed up the import process. This means the metrics
data is simply not created. Some time later two BG job runs on the MR at the same time and creates the duplicated record. Probably our MR Refresh*
jobs.
Note: the same issue occurs in the issue_metrics
table.
Checking for duplicates
select count(*) from (select merge_request_id, count(merge_request_id) from merge_request_metrics group by merge_request_id having count(merge_request_id) > 1) merge_request_metrics;
Post Migration steps:
- Add a new, temporary unique index on
merge_request_id
where we have a condition on the last MR id (id > last_mr_metrics_id
). This ensures that new records will be unique. - Iterate over the
merge_request_metrics
table and merge the duplicated records. - Add unique index on
merge_request_id
. - Remove the temporary index.
Query: (1k batch)
SELECT "merge_request_metrics"."merge_request_id"
FROM "merge_request_metrics"
WHERE "merge_request_metrics"."merge_request_id" IN
(SELECT "merge_request_metrics"."merge_request_id"
FROM "merge_request_metrics"
WHERE "merge_request_metrics"."id" >= 15925818
AND "merge_request_metrics"."id" < 15926834)
GROUP BY "merge_request_metrics"."merge_request_id"
HAVING (COUNT(merge_request_metrics.merge_request_id) > 1)
Expected runtime: 2-5 minutes
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team