Skip to content

Dedup issue_metrics table

Adam Hegyi requested to merge 214456-dedup-issues-metrics into master

What does this MR do?

This MR deduplicates the issue_metrics table in a similar fashion like !29566 (merged)

The MR does the following:

  1. The application code ensures that issue_metrics records are created with the safe find_or_create method to avoid RecordNotUnique errors (if metrics records are created concurrently).
  2. Add a new unique index with the highest issue_id which prevents further duplicates (very unlikely that we'll see duplicates).
  3. Iterate over issue_metrics with EachBatch and find duplicated entries.
  4. Eliminate the duplicated rows by "merging" them.
  5. Add global unique index on issue_id and remove the tmp index created by step 2.

Migration

Up:

== 20210226141517 DedupIssueMetrics: migrating ================================
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:issue_metrics, :issue_id, {:where=>"id > 504", :unique=>true, :name=>"tmp_unique_issue_metrics_by_issue_id", :algorithm=>:concurrently})
   -> 0.0020s
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- add_index(:issue_metrics, :issue_id, {:where=>"id > 504", :unique=>true, :name=>"tmp_unique_issue_metrics_by_issue_id", :algorithm=>:concurrently})
   -> 0.0182s
-- execute("RESET ALL")
   -> 0.0005s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:issue_metrics, :issue_id, {:unique=>true, :name=>"index_unique_issue_metrics_issue_id", :algorithm=>:concurrently})
   -> 0.0018s
-- add_index(:issue_metrics, :issue_id, {:unique=>true, :name=>"index_unique_issue_metrics_issue_id", :algorithm=>:concurrently})
   -> 0.0035s
-- transaction_open?()
   -> 0.0000s
-- indexes(:issue_metrics)
   -> 0.0018s
-- remove_index(:issue_metrics, {:algorithm=>:concurrently, :name=>"tmp_unique_issue_metrics_by_issue_id"})
   -> 0.0015s
-- transaction_open?()
   -> 0.0000s
-- indexes(:issue_metrics)
   -> 0.0014s
-- remove_index(:issue_metrics, {:algorithm=>:concurrently, :name=>"index_issue_metrics"})
   -> 0.0018s
== 20210226141517 DedupIssueMetrics: migrated (0.0476s) =======================

Down:

== 20210226141517 DedupIssueMetrics: reverting ================================
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:issue_metrics, :issue_id, {:name=>"index_issue_metrics", :algorithm=>:concurrently})
   -> 0.0028s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- add_index(:issue_metrics, :issue_id, {:name=>"index_issue_metrics", :algorithm=>:concurrently})
   -> 0.0184s
-- execute("RESET ALL")
   -> 0.0016s
-- transaction_open?()
   -> 0.0000s
-- indexes(:issue_metrics)
   -> 0.0060s
-- current_schema()
   -> 0.0005s
-- transaction_open?()
   -> 0.0000s
-- indexes(:issue_metrics)
   -> 0.0062s
-- remove_index(:issue_metrics, {:algorithm=>:concurrently, :name=>"index_unique_issue_metrics_issue_id"})
   -> 0.0039s
== 20210226141517 DedupIssueMetrics: reverted (0.0498s) =======================

Database

Queries:

Timing calculation:

(32_000_000 rows / BATCH_SIZE_1000) * (2 queries, ~10ms) = < 6 minutes

The calculation does not contain the deduplication time. IMHO it's negligible since we don't have too many duplicated rows (6 rows).

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Related to #214456 (closed)

Edited by Adam Hegyi

Merge request reports