Skip to content

Add unique foreign keys to Geo registries

What does this MR do?

There are a number of registry tables which do not have a uniqueness constraint on their foreign keys:

  • container_repository_registry
  • job_artifact_registry
  • merge_request_diff_registry
  • package_file_registry
  • terraform_state_version_registry

Adding the uniqueness constraints would reduce the risk of bad data.

Console output

UP:

== 20210217020152 AddUniqueIndexOnPackageFileRegistry: migrating ==============
-- execute("      DELETE FROM package_file_registry\n      USING (\n        SELECT package_file_id, MIN(id) as min_id\n        FROM package_file_registry\n        GROUP BY package_file_id\n        HAVING COUNT(id) > 1\n      ) as package_file_registry_duplicates\n      WHERE package_file_registry_duplicates.package_file_id = package_file_registry.package_file_id\n      AND package_file_registry_duplicates.min_id <> package_file_registry.id\n")
   -> 0.0217s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:package_file_registry, :package_file_id, {:unique=>true, :name=>"unique_index_package_file_registry_on_package_file_id", :algorithm=>:concurrently})
   -> 0.0034s
-- add_index(:package_file_registry, :package_file_id, {:unique=>true, :name=>"unique_index_package_file_registry_on_package_file_id", :algorithm=>:concurrently})
   -> 0.0233s
-- transaction_open?()
   -> 0.0000s
-- indexes(:package_file_registry)
   -> 0.0028s
-- remove_index(:package_file_registry, {:algorithm=>:concurrently, :name=>"index_package_file_registry_on_repository_id"})
   -> 0.0060s
== 20210217020152 AddUniqueIndexOnPackageFileRegistry: migrated (0.0586s) =====

== 20210217020153 AddUniqueIndexOnJobArtifactRegistry: migrating ==============
-- execute("      DELETE FROM job_artifact_registry\n      USING (\n        SELECT artifact_id, MIN(id) as min_id\n        FROM job_artifact_registry\n        GROUP BY artifact_id\n        HAVING COUNT(id) > 1\n      ) as job_artifact_registry_duplicates\n      WHERE job_artifact_registry_duplicates.artifact_id = job_artifact_registry.artifact_id\n      AND job_artifact_registry_duplicates.min_id <> job_artifact_registry.id\n")
   -> 0.0009s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:job_artifact_registry, :artifact_id, {:unique=>true, :name=>"unique_index_job_artifact_registry_on_artifact_id", :algorithm=>:concurrently})
   -> 0.0016s
-- add_index(:job_artifact_registry, :artifact_id, {:unique=>true, :name=>"unique_index_job_artifact_registry_on_artifact_id", :algorithm=>:concurrently})
   -> 0.0022s
-- transaction_open?()
   -> 0.0000s
-- indexes(:job_artifact_registry)
   -> 0.0018s
-- remove_index(:job_artifact_registry, {:algorithm=>:concurrently, :name=>"index_job_artifact_registry_on_artifact_id"})
   -> 0.0009s
== 20210217020153 AddUniqueIndexOnJobArtifactRegistry: migrated (0.0087s) =====

== 20210217020154 AddUniqueIndexOnContainerRepositoryRegistry: migrating ======
-- execute("      DELETE FROM container_repository_registry\n      USING (\n        SELECT container_repository_id, MIN(id) as min_id\n        FROM container_repository_registry\n        GROUP BY container_repository_id\n        HAVING COUNT(id) > 1\n      ) as container_repository_registry_duplicates\n      WHERE container_repository_registry_duplicates.container_repository_id = container_repository_registry.container_repository_id\n      AND container_repository_registry_duplicates.min_id <> container_repository_registry.id\n")
   -> 0.0009s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:container_repository_registry, :container_repository_id, {:unique=>true, :name=>"unique_index_container_repository_registry_on_repository_id", :algorithm=>:concurrently})
   -> 0.0016s
-- add_index(:container_repository_registry, :container_repository_id, {:unique=>true, :name=>"unique_index_container_repository_registry_on_repository_id", :algorithm=>:concurrently})
   -> 0.0029s
-- transaction_open?()
   -> 0.0000s
-- indexes(:container_repository_registry)
   -> 0.0019s
-- remove_index(:container_repository_registry, {:algorithm=>:concurrently, :name=>"index_container_repository_registry_on_repository_id"})
   -> 0.0011s
== 20210217020154 AddUniqueIndexOnContainerRepositoryRegistry: migrated (0.0097s)

== 20210217020155 AddUniqueIndexOnMergeRequestDiffRegistry: migrating =========
-- execute("      DELETE FROM merge_request_diff_registry\n      USING (\n        SELECT merge_request_diff_id, MIN(id) as min_id\n        FROM merge_request_diff_registry\n        GROUP BY merge_request_diff_id\n        HAVING COUNT(id) > 1\n      ) as merge_request_diff_registry_duplicates\n      WHERE merge_request_diff_registry_duplicates.merge_request_diff_id = merge_request_diff_registry.merge_request_diff_id\n      AND merge_request_diff_registry_duplicates.min_id <> merge_request_diff_registry.id\n")
   -> 0.0010s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:merge_request_diff_registry, :merge_request_diff_id, {:unique=>true, :name=>"unique_index_merge_request_diff_registry_on_mr_diff_id", :algorithm=>:concurrently})
   -> 0.0016s
-- add_index(:merge_request_diff_registry, :merge_request_diff_id, {:unique=>true, :name=>"unique_index_merge_request_diff_registry_on_mr_diff_id", :algorithm=>:concurrently})
   -> 0.0026s
-- transaction_open?()
   -> 0.0000s
-- indexes(:merge_request_diff_registry)
   -> 0.0016s
-- remove_index(:merge_request_diff_registry, {:algorithm=>:concurrently, :name=>"index_merge_request_diff_registry_on_mr_diff_id"})
   -> 0.0009s
== 20210217020155 AddUniqueIndexOnMergeRequestDiffRegistry: migrated (0.0089s)

== 20210217020156 AddUniqueIndexOnTerraformStateVersionRegistry: migrating ====
-- execute("      DELETE FROM terraform_state_version_registry\n      USING (\n        SELECT terraform_state_version_id, MIN(id) as min_id\n        FROM terraform_state_version_registry\n        GROUP BY terraform_state_version_id\n        HAVING COUNT(id) > 1\n      ) as terraform_state_version_registry_duplicates\n      WHERE terraform_state_version_registry_duplicates.terraform_state_version_id = terraform_state_version_registry.terraform_state_version_id\n      AND terraform_state_version_registry_duplicates.min_id <> terraform_state_version_registry.id\n")
   -> 0.0009s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:terraform_state_version_registry, :terraform_state_version_id, {:unique=>true, :name=>"unique_index_tf_state_versions_registry_on_tf_state_versions_id", :algorithm=>:concurrently})
   -> 0.0017s
-- add_index(:terraform_state_version_registry, :terraform_state_version_id, {:unique=>true, :name=>"unique_index_tf_state_versions_registry_on_tf_state_versions_id", :algorithm=>:concurrently})
   -> 0.0022s
-- transaction_open?()
   -> 0.0000s
-- indexes(:terraform_state_version_registry)
   -> 0.0020s
-- remove_index(:terraform_state_version_registry, {:algorithm=>:concurrently, :name=>"index_tf_state_versions_registry_on_tf_state_versions_id"})
   -> 0.0016s
== 20210217020156 AddUniqueIndexOnTerraformStateVersionRegistry: migrated (0.0097s)

DOWN:

== 20210217020152 AddUniqueIndexOnPackageFileRegistry: reverting ==============
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:package_file_registry, :package_file_id, {:name=>"index_package_file_registry_on_repository_id", :algorithm=>:concurrently})
   -> 0.0042s
-- add_index(:package_file_registry, :package_file_id, {:name=>"index_package_file_registry_on_repository_id", :algorithm=>:concurrently})
   -> 0.0341s
-- transaction_open?()
   -> 0.0000s
-- indexes(:package_file_registry)
   -> 0.0031s
-- remove_index(:package_file_registry, {:algorithm=>:concurrently, :name=>"unique_index_package_file_registry_on_package_file_id"})
   -> 0.0018s
== 20210217020152 AddUniqueIndexOnPackageFileRegistry: reverted (0.0450s) =====

== 20210217020153 AddUniqueIndexOnJobArtifactRegistry: reverting ==============
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:job_artifact_registry, :artifact_id, {:name=>"index_job_artifact_registry_on_artifact_id", :algorithm=>:concurrently})
   -> 0.0033s
-- add_index(:job_artifact_registry, :artifact_id, {:name=>"index_job_artifact_registry_on_artifact_id", :algorithm=>:concurrently})
   -> 0.0033s
-- transaction_open?()
   -> 0.0000s
-- indexes(:job_artifact_registry)
   -> 0.0019s
-- remove_index(:job_artifact_registry, {:algorithm=>:concurrently, :name=>"unique_index_job_artifact_registry_on_artifact_id"})
   -> 0.0017s
== 20210217020153 AddUniqueIndexOnJobArtifactRegistry: reverted (0.0118s) =====

== 20210217020154 AddUniqueIndexOnContainerRepositoryRegistry: reverting ======
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:container_repository_registry, :container_repository_id, {:name=>"index_container_repository_registry_on_repository_id", :algorithm=>:concurrently})
   -> 0.0034s
-- add_index(:container_repository_registry, :container_repository_id, {:name=>"index_container_repository_registry_on_repository_id", :algorithm=>:concurrently})
   -> 0.0036s
-- transaction_open?()
   -> 0.0000s
-- indexes(:container_repository_registry)
   -> 0.0018s
-- remove_index(:container_repository_registry, {:algorithm=>:concurrently, :name=>"unique_index_container_repository_registry_on_repository_id"})
   -> 0.0016s
== 20210217020154 AddUniqueIndexOnContainerRepositoryRegistry: reverted (0.0119s)

== 20210217020155 AddUniqueIndexOnMergeRequestDiffRegistry: reverting =========
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:merge_request_diff_registry, :merge_request_diff_id, {:name=>"index_merge_request_diff_registry_on_mr_diff_id", :algorithm=>:concurrently})
   -> 0.0033s
-- add_index(:merge_request_diff_registry, :merge_request_diff_id, {:name=>"index_merge_request_diff_registry_on_mr_diff_id", :algorithm=>:concurrently})
   -> 0.0035s
-- transaction_open?()
   -> 0.0000s
-- indexes(:merge_request_diff_registry)
   -> 0.0020s
-- remove_index(:merge_request_diff_registry, {:algorithm=>:concurrently, :name=>"unique_index_merge_request_diff_registry_on_mr_diff_id"})
   -> 0.0018s
== 20210217020155 AddUniqueIndexOnMergeRequestDiffRegistry: reverted (0.0122s)

== 20210217020156 AddUniqueIndexOnTerraformStateVersionRegistry: reverting ====
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:terraform_state_version_registry, :terraform_state_version_id, {:name=>"index_tf_state_versions_registry_on_tf_state_versions_id", :algorithm=>:concurrently})
   -> 0.0037s
-- add_index(:terraform_state_version_registry, :terraform_state_version_id, {:name=>"index_tf_state_versions_registry_on_tf_state_versions_id", :algorithm=>:concurrently})
   -> 0.0031s
-- transaction_open?()
   -> 0.0000s
-- indexes(:terraform_state_version_registry)
   -> 0.0024s
-- remove_index(:terraform_state_version_registry, {:algorithm=>:concurrently, :name=>"unique_index_tf_state_versions_registry_on_tf_state_versions_id"})
   -> 0.0015s
== 20210217020156 AddUniqueIndexOnTerraformStateVersionRegistry: reverted (0.0121s)

Queries:

Package File Registry deletes (Depesz)[https://explain.depesz.com/s/DFxp]:

SQL:

      DELETE FROM package_file_registry
      USING (
        SELECT package_file_id, MIN(id) as min_id
        FROM package_file_registry
        GROUP BY package_file_id
        HAVING COUNT(id) > 1
      ) as package_file_registry_duplicates
      WHERE package_file_registry_duplicates.package_file_id = package_file_registry.package_file_id
      AND package_file_registry_duplicates.min_id <> package_file_registry.id

Plan:

 Delete on package_file_registry  (cost=1916.58..3544.57 rows=25701 width=38) (actual time=93.977..93.978 rows=0 loops=1)
   Buffers: shared hit=78143 dirtied=73
   ->  Hash Join  (cost=1916.58..3544.57 rows=25701 width=38) (actual time=24.913..49.609 rows=77009 loops=1)
         Hash Cond: (package_file_registry.package_file_id = package_file_registry_duplicates.package_file_id)
         Join Filter: (package_file_registry_duplicates.min_id <> package_file_registry.id)
         Rows Removed by Join Filter: 1
         Buffers: shared hit=1134 dirtied=73
         ->  Seq Scan on package_file_registry  (cost=0.00..1338.05 rows=77105 width=14) (actual time=0.017..8.819 rows=77034 loops=1)
               Buffers: shared hit=567 dirtied=1
         ->  Hash  (cost=1916.53..1916.53 rows=4 width=40) (actual time=24.886..24.887 rows=1 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               Buffers: shared hit=567 dirtied=72
               ->  Subquery Scan on package_file_registry_duplicates  (cost=1916.34..1916.53 rows=4 width=40) (actual time=24.881..24.884 rows=1 loops=1)
                     Buffers: shared hit=567 dirtied=72
                     ->  HashAggregate  (cost=1916.34..1916.49 rows=4 width=8) (actual time=24.877..24.879 rows=1 loops=1)
                           Group Key: package_file_registry_1.package_file_id
                           Filter: (count(package_file_registry_1.id) > 1)
                           Rows Removed by Filter: 24
                           Buffers: shared hit=567 dirtied=72
                           ->  Seq Scan on package_file_registry package_file_registry_1  (cost=0.00..1338.05 rows=77105 width=8) (actual time=0.004..10.790 rows=77034 loops=1)
                                 Buffers: shared hit=567 dirtied=72
 Planning Time: 0.425 ms
 Execution Time: 94.028 ms

Job Artifact Deletes (Depesz)[https://explain.depesz.com/s/6VwO]:

SQL:

      DELETE FROM job_artifact_registry
      USING (
        SELECT artifact_id, MIN(id) as min_id
        FROM job_artifact_registry
        GROUP BY artifact_id
        HAVING COUNT(id) > 1
      ) as job_artifact_registry_duplicates
      WHERE job_artifact_registry_duplicates.artifact_id = job_artifact_registry.artifact_id
      AND job_artifact_registry_duplicates.min_id <> job_artifact_registry.id

Plan:

 Delete on job_artifact_registry  (cost=2091.81..3793.38 rows=27523 width=38) (actual time=143.357..143.358 rows=0 loops=1)
   Buffers: shared hit=85157 dirtied=618
   ->  Hash Join  (cost=2091.81..3793.38 rows=27523 width=38) (actual time=26.620..54.384 rows=83919 loops=1)
         Hash Cond: (job_artifact_registry.artifact_id = job_artifact_registry_duplicates.artifact_id)
         Join Filter: (job_artifact_registry_duplicates.min_id <> job_artifact_registry.id)
         Rows Removed by Join Filter: 1
         Buffers: shared hit=1238
         ->  Seq Scan on job_artifact_registry  (cost=0.00..1459.98 rows=84098 width=14) (actual time=0.012..10.552 rows=84074 loops=1)
               Buffers: shared hit=619
         ->  Hash  (cost=2091.58..2091.58 rows=18 width=40) (actual time=26.554..26.555 rows=1 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               Buffers: shared hit=619
               ->  Subquery Scan on job_artifact_registry_duplicates  (cost=2090.72..2091.58 rows=18 width=40) (actual time=26.514..26.520 rows=1 loops=1)
                     Buffers: shared hit=619
                     ->  HashAggregate  (cost=2090.72..2091.40 rows=18 width=8) (actual time=26.510..26.516 rows=1 loops=1)
                           Group Key: job_artifact_registry_1.artifact_id
                           Filter: (count(job_artifact_registry_1.id) > 1)
                           Rows Removed by Filter: 154
                           Buffers: shared hit=619
                           ->  Seq Scan on job_artifact_registry job_artifact_registry_1  (cost=0.00..1459.98 rows=84098 width=8) (actual time=0.005..8.439 rows=84074 loops=1)
                                 Buffers: shared hit=619
 Planning Time: 0.274 ms
 Execution Time: 143.439 ms

Container Repository Registry Deletes (Depesz)[https://explain.depesz.com/s/6VfR]:

SQL:

      DELETE FROM container_repository_registry
      USING (
        SELECT container_repository_id, MIN(id) as min_id
        FROM container_repository_registry
        GROUP BY container_repository_id
        HAVING COUNT(id) > 1
      ) as container_repository_registry_duplicates
      WHERE container_repository_registry_duplicates.container_repository_id = container_repository_registry.container_repository_id
      AND container_repository_registry_duplicates.min_id <> container_repository_registry.id

Plan:

      DELETE FROM container_repository_registry
      USING (
        SELECT container_repository_id, MIN(id) as min_id
        FROM container_repository_registry
        GROUP BY container_repository_id
        HAVING COUNT(id) > 1
      ) as container_repository_registry_duplicates
      WHERE container_repository_registry_duplicates.container_repository_id = container_repository_registry.container_repository_id
      AND container_repository_registry_duplicates.min_id <> container_repository_registry.id

Merge Request Diff Registry Deletes (Depesz)[https://explain.depesz.com/s/JRGw]:

SQL:

      DELETE FROM merge_request_diff_registry
      USING (
        SELECT merge_request_diff_id, MIN(id) as min_id
        FROM merge_request_diff_registry
        GROUP BY merge_request_diff_id
        HAVING COUNT(id) > 1
      ) as merge_request_diff_registry_duplicates
      WHERE merge_request_diff_registry_duplicates.merge_request_diff_id = merge_request_diff_registry.merge_request_diff_id
      AND merge_request_diff_registry_duplicates.min_id <> merge_request_diff_registry.id

Plan:

 Delete on merge_request_diff_registry  (cost=2009.40..4625.00 rows=80822 width=46) (actual time=129.247..129.248 rows=0 loops=1)
   Buffers: shared hit=81980 dirtied=595
   ->  Nested Loop  (cost=2009.40..4625.00 rows=80822 width=46) (actual time=24.761..45.344 rows=80790 loops=1)
         Join Filter: ((merge_request_diff_registry_duplicates.min_id <> merge_request_diff_registry.id) AND (merge_request_diff_registry.merge_request_diff_id = merge_request_diff_registry_duplicates.merge_request_diff_id))
         Rows Removed by Join Filter: 1
         Buffers: shared hit=1190 dirtied=35
         ->  Subquery Scan on merge_request_diff_registry_duplicates  (cost=2009.40..2009.43 rows=1 width=56) (actual time=24.751..24.754 rows=1 loops=1)
               Buffers: shared hit=595 dirtied=35
               ->  HashAggregate  (cost=2009.40..2009.42 rows=1 width=16) (actual time=24.747..24.748 rows=1 loops=1)
                     Group Key: merge_request_diff_registry_1.merge_request_diff_id
                     Filter: (count(merge_request_diff_registry_1.id) > 1)
                     Buffers: shared hit=595 dirtied=35
                     ->  Seq Scan on merge_request_diff_registry merge_request_diff_registry_1  (cost=0.00..1403.23 rows=80823 width=16) (actual time=0.008..9.252 rows=80791 loops=1)
                           Buffers: shared hit=595 dirtied=35
         ->  Seq Scan on merge_request_diff_registry  (cost=0.00..1403.23 rows=80823 width=22) (actual time=0.008..9.741 rows=80791 loops=1)
               Buffers: shared hit=595
 Planning Time: 0.544 ms
 Execution Time: 129.301 ms

Terraform State Version Registry Deletes (Depesz)[https://explain.depesz.com/s/4vndW]:

SQL:

      DELETE FROM terraform_state_version_registry
      USING (
        SELECT terraform_state_version_id, MIN(id) as min_id
        FROM terraform_state_version_registry
        GROUP BY terraform_state_version_id
        HAVING COUNT(id) > 1
      ) as terraform_state_version_registry_duplicates
      WHERE terraform_state_version_registry_duplicates.terraform_state_version_id = terraform_state_version_registry.terraform_state_version_id
      AND terraform_state_version_registry_duplicates.min_id <> terraform_state_version_registry.id

Plan:

 Delete on terraform_state_version_registry  (cost=2115.88..4870.30 rows=85135 width=46) (actual time=134.908..134.909 rows=0 loops=1)
   Buffers: shared hit=86252
   ->  Nested Loop  (cost=2115.88..4870.30 rows=85135 width=46) (actual time=25.865..47.237 rows=85000 loops=1)
         Join Filter: ((terraform_state_version_registry_duplicates.min_id <> terraform_state_version_registry.id) AND (terraform_state_version_registry.terraform_state_version_id = terraform_state_version_registry_duplicates.terraform_state_version_id))
         Rows Removed by Join Filter: 1
         Buffers: shared hit=1252
         ->  Subquery Scan on terraform_state_version_registry_duplicates  (cost=2115.88..2115.90 rows=1 width=56) (actual time=25.855..25.857 rows=1 loops=1)
               Buffers: shared hit=626
               ->  HashAggregate  (cost=2115.88..2115.89 rows=1 width=16) (actual time=25.851..25.852 rows=1 loops=1)
                     Group Key: terraform_state_version_registry_1.terraform_state_version_id
                     Filter: (count(terraform_state_version_registry_1.id) > 1)
                     Buffers: shared hit=626
                     ->  Seq Scan on terraform_state_version_registry terraform_state_version_registry_1  (cost=0.00..1477.36 rows=85136 width=16) (actual time=0.019..9.925 rows=85001 loops=1)
                           Buffers: shared hit=626
         ->  Seq Scan on terraform_state_version_registry  (cost=0.00..1477.36 rows=85136 width=22) (actual time=0.008..9.726 rows=85001 loops=1)
               Buffers: shared hit=626
 Planning Time: 0.539 ms
 Execution Time: 134.975 ms
(18 rows)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Related to #296928 (closed)

Edited by Alex Ives

Merge request reports