Skip to content

Deduplicate epic iids and add uniqueness constraint

Alexandru Croitor requested to merge issue-277354 into master

What does this MR do?

  • Detects and fixes Epic IID duplicates per group within DB.
  • Adds a unique index constraint to epics table on group_id, iid pair of fields
  • As index migration needs to be ran after the data has been de-duplicated, both migrations are added to post_migrate

Current number duplicates on gitlab.com: ~14

gitlabhq_production=> SELECT iid, group_id, COUNT(*) FROM epics GROUP BY iid, group_id HAVING COUNT(*) > 1;
 iid  | group_id | count
------+----------+-------
   10 |  7936550 |     2
    6 |  5806375 |     2
    2 | 10009200 |     2
    1 |  9811553 |     2
    1 |  9806281 |     2
    1 |  4925694 |     2
 4827 |     9970 |     2
    1 | 10009200 |     3
    1 |  7569022 |     3
    2 |  6874569 |     2
    1 |  9861922 |     2
    8 |  5806375 |     2
    2 |  6536162 |     2
    1 |  6874569 |     2
(14 rows)
Execution plan
HashAggregate  (cost=17031.82..17103.31 rows=1906 width=16) (actual time=317.357..329.170 rows=14 loops=1)
   Group Key: epics.iid, epics.group_id
   Filter: (count(*) > 1)
   Rows Removed by Filter: 57181
   Buffers: shared hit=184 read=3788 dirtied=8
   ->  Seq Scan on public.epics  (cost=0.00..16459.91 rows=57191 width=8) (actual time=0.151..280.305 rows=57211 loops=1)
         Buffers: shared hit=184 read=3788 dirtied=8

Time: 330.285 ms
  - planning: 0.272 ms
  - execution: 330.013 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 184 (~1.40 MiB) from the buffer pool
  - reads: 3788 (~29.60 MiB) from the OS file cache, including disk I/O
  - dirtied: 8 (~64.00 KiB)
  - writes: 0

SQLs generated when running the migration

(3.1ms) SELECT iid, group_id, COUNT(*) FROM epics GROUP BY iid, group_id HAVING COUNT(*) > 1;
Epic Load (1.2ms)  SELECT "epics".* FROM "epics" WHERE "epics"."iid" = $1 AND "epics"."group_id" = $2 ORDER BY "epics"."id" ASC LIMIT $3  [["iid", 2], ["group_id", 22], ["LIMIT", 1]]
Group Load (0.6ms)  SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = $1 AND "namespaces"."id" = $2 LIMIT $3  [["type", "Group"], ["id", 22], ["LIMIT", 1]]
(0.1ms)  BEGIN
InternalId Load (0.4ms)  SELECT "internal_ids".* FROM "internal_ids" WHERE "internal_ids"."namespace_id" = $1 AND "internal_ids"."usage" = $2 LIMIT $3  [["namespace_id", 22], ["usage", 4], ["LIMIT", 1]]
InternalId Load (0.5ms)  SELECT "internal_ids".* FROM "internal_ids" WHERE "internal_ids"."id" = $1 LIMIT $2 FOR UPDATE  [["id", 12], ["LIMIT", 1]]
InternalId Update (0.5ms)  UPDATE "internal_ids" SET "last_value" = $1 WHERE "internal_ids"."id" = $2  [["last_value", 9], ["id", 12]]
(0.9ms)  COMMIT

related to #277354 (closed)

Migrations Output

Up

 gitlab(issue-277354)$ be rake db:migrate
== 20201106134950 DeduplicateEpicIids: migrating ==============================
== 20201106134950 DeduplicateEpicIids: migrated (0.2515s) =====================

== 20201106151002 AddUniqIndexToEpicIidGroupId: migrating =====================
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:epics, [:group_id, :iid], {:unique=>true, :name=>"index_epics_on_group_id_and_iid", :algorithm=>:concurrently})
   -> 0.0077s
-- execute("SET statement_timeout TO 0")
   -> 0.0002s
-- add_index(:epics, [:group_id, :iid], {:unique=>true, :name=>"index_epics_on_group_id_and_iid", :algorithm=>:concurrently})
   -> 0.0081s
-- execute("RESET ALL")
   -> 0.0002s
== 20201106151002 AddUniqIndexToEpicIidGroupId: migrated (0.0168s) ============

Down

 gitlab(issue-277354)$ be rake db:migrate:down VERSION=20201106151002
== 20201106151002 AddUniqIndexToEpicIidGroupId: reverting =====================
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:epics, [:group_id, :iid], {:name=>"index_epics_on_group_id_and_iid", :algorithm=>:concurrently})
   -> 0.0120s
-- execute("SET statement_timeout TO 0")
   -> 0.0002s
-- remove_index(:epics, {:name=>"index_epics_on_group_id_and_iid", :algorithm=>:concurrently, :column=>[:group_id, :iid]})
   -> 0.0163s
-- execute("RESET ALL")
   -> 0.0002s
== 20201106151002 AddUniqIndexToEpicIidGroupId: reverted (0.0293s) ============

 gitlab(issue-277354)$ be rake db:migrate:down VERSION=20201106134950
== 20201106134950 DeduplicateEpicIids: reverting ==============================
== 20201106134950 DeduplicateEpicIids: reverted (0.0000s) =====================

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Alexandru Croitor

Merge request reports