Fix container_repositories index repair to handle one-to-one relationship

What does this MR do and why?

The container_repository_states table has a primary key constraint on container_repository_id, enforcing a one-to-one relationship with container_repositories. When duplicate container_repositories entries exist, previously the repair task attempted to update all container_repository_states references to point to the "good" ID, which violated the primary key constraint when a state entry already existed for that ID.

For example, suppose the container_repositories table:

id project_id name
1 1000 test
2 1000 test
3 1000 test

With this container_repository_states table:

container_repository_id verified_at
1 2025-12-01
2 2025-12-02
3 2025-12-03

The old approach tried to fix the duplication in container_repository_states by remapping the container_repository_id:

  UPDATE container_repository_states SET container_repository_id = 3 WHERE container_repository_id IN (1)

However, this failed with:

  PG::UniqueViolation: duplicate key value violates unique constraint "container_repository_states_pkey"

The fix adds a new delete_orphaned option to reference configurations. When set to true, the repair task deletes all references pointing to the duplicate (bad) IDs before deleting the duplicate records. This approach:

  1. Deletes the container_repository_states rows for duplicate IDs.
  2. Deletes the duplicate container_repository rows.
  3. Preserves the state entry for the good ID.

This is simpler and more appropriate than using deduplication_column for one-to-one relationships where we don't need to preserve the orphaned references.

References

ZD: https://gitlab.zendesk.com/agent/tickets/676052

How to set up and validate locally

  1. Run gdk psql -d gitlabhq_test
  2. Run these SQL queries:
-- 1. Create a test organization (required for namespace and project)
INSERT INTO organizations (
  id,
  name,
  path,
  created_at,
  updated_at
)
VALUES (
  999999,
  'test-organization',
  'test-organization',
  NOW(),
  NOW()
)
ON CONFLICT (id) DO NOTHING;

-- 2. Create a test namespace (required for project_namespace_id foreign key)
INSERT INTO namespaces (
  id,
  name,
  path,
  type,
  created_at,
  updated_at,
  organization_id
)
VALUES (
  999999,
  'test-namespace',
  'test-namespace',
  'Project',
  NOW(),
  NOW(),
  999999  -- Use the test organization we just created
)
ON CONFLICT (id) DO NOTHING;

-- 3. Create a test project (required for foreign key constraint)
INSERT INTO projects (
  id,
  name,
  path,
  namespace_id,
  project_namespace_id,
  organization_id,
  created_at,
  updated_at
)
VALUES (
  999999,
  'test-project',
  'test-project',
  999999,  -- Use the test namespace we just created
  999999,  -- Use the test namespace we just created
  999999,  -- Use the test organization we just created
  NOW(),
  NOW()
)
ON CONFLICT (id) DO NOTHING;

-- 4. Drop the unique index to allow duplicates
DROP INDEX IF EXISTS index_container_repositories_on_project_id_and_name;

-- 5. Create test data: duplicate container_repositories entries
-- Simulate the scenario where we have duplicates with project_id=999999, name='test-repo'
INSERT INTO container_repositories (
  project_id,
  name,
  created_at,
  updated_at
)
VALUES
  (999999, 'test-repo', NOW(), NOW()),      -- ID: will be the "good" ID
  (999999, 'test-repo', NOW(), NOW());      -- ID: will be the "bad" ID (duplicate)

-- 6. Create container_repository_states entries
-- We'll use the actual IDs from the previous insert
-- First, get the IDs dynamically
WITH inserted_ids AS (
  SELECT id FROM container_repositories
  WHERE project_id = 999999 AND name = 'test-repo'
  ORDER BY id
)
INSERT INTO container_repository_states (container_repository_id, verified_at)
SELECT id, NOW() FROM inserted_ids;
  1. With this data, you should see:
gitlabhq_test=# \x
Expanded display is on.
gitlabhq_test=# select * from container_repository_states;
-[ RECORD 1 ]------------+------------------------------
verification_started_at  |
verification_retry_at    |
verified_at              | 2025-12-20 05:47:55.426783+00
container_repository_id  | 1
verification_state       | 0
verification_retry_count | 0
verification_checksum    |
verification_failure     |
project_id               | 999999
-[ RECORD 2 ]------------+------------------------------
verification_started_at  |
verification_retry_at    |
verified_at              | 2025-12-20 05:47:55.426783+00
container_repository_id  | 2
verification_state       | 0
verification_retry_count | 0
verification_checksum    |
verification_failure     |
project_id               | 999999

gitlabhq_test=# select * from container_repositories;
-[ RECORD 1 ]--------------------+---------------------------
id                               | 1
project_id                       | 999999
name                             | test-repo
created_at                       | 2025-12-20 05:47:55.424048
updated_at                       | 2025-12-20 05:47:55.424048
status                           |
expiration_policy_started_at     |
expiration_policy_cleanup_status | 0
expiration_policy_completed_at   |
last_cleanup_deleted_tags_count  |
delete_started_at                |
status_updated_at                |
failed_deletion_count            | 0
next_delete_attempt_at           |
-[ RECORD 2 ]--------------------+---------------------------
id                               | 2
project_id                       | 999999
name                             | test-repo
created_at                       | 2025-12-20 05:47:55.424048
updated_at                       | 2025-12-20 05:47:55.424048
status                           |
expiration_policy_started_at     |
expiration_policy_cleanup_status | 0
expiration_policy_completed_at   |
last_cleanup_deleted_tags_count  |
delete_started_at                |
status_updated_at                |
failed_deletion_count            | 0
next_delete_attempt_at           |
  1. Then run RAILS_ENV=test bundle exec rake gitlab:db:repair_index. This should succeed and show the deduplication:
I, [2025-12-19T21:48:54.150239 #35056]  INFO -- : Processing index 'index_container_repositories_on_project_id_and_name' on table 'container_repositories'...
I, [2025-12-19T21:48:54.150247 #35056]  INFO -- : Index is unique. Checking for duplicate data...
I, [2025-12-19T21:48:54.150254 #35056]  INFO -- : Checking for duplicates in 'container_repositories' for columns: project_id,name...
I, [2025-12-19T21:48:54.150327 #35056]  INFO -- : SQL: SELECT ARRAY_AGG(id ORDER BY id ASC) as ids
FROM "container_repositories"
WHERE "project_id" IS NOT NULL AND "name" IS NOT NULL
GROUP BY "project_id", "name"
HAVING COUNT(*) > 1

W, [2025-12-19T21:48:54.153129 #35056]  WARN -- : Found 1 duplicates in 'container_repositories' for columns: project_id,name
I, [2025-12-19T21:48:54.153308 #35056]  INFO -- : Deleting orphaned references in 'container_repository_states' for column 'container_repository_id'...
I, [2025-12-19T21:48:54.153356 #35056]  INFO -- : SQL: DELETE FROM "container_repository_states" WHERE "container_repository_id" IN (2)
I, [2025-12-19T21:48:54.154665 #35056]  INFO -- : Deleted 1 orphaned references in 'container_repository_states'
I, [2025-12-19T21:48:54.154676 #35056]  INFO -- : Deleting duplicate records from container_repositories...
I, [2025-12-19T21:48:54.154706 #35056]  INFO -- : SQL: DELETE FROM "container_repositories" WHERE id IN (2)
I, [2025-12-19T21:48:54.155332 #35056]  INFO -- : Deleted 1 duplicate records from container_repositories

The table should look something like:

gitlabhq_test=# select * from container_repository_states;
-[ RECORD 1 ]------------+------------------------------
verification_started_at  |
verification_retry_at    |
verified_at              | 2025-12-20 05:47:55.426783+00
container_repository_id  | 1
verification_state       | 0
verification_retry_count | 0
verification_checksum    |
verification_failure     |
project_id               | 999999

gitlabhq_test=# select * from container_repositories;
-[ RECORD 1 ]--------------------+---------------------------
id                               | 1
project_id                       | 999999
name                             | test-repo
created_at                       | 2025-12-20 05:47:55.424048
updated_at                       | 2025-12-20 05:47:55.424048
status                           |
expiration_policy_started_at     |
expiration_policy_cleanup_status | 0
expiration_policy_completed_at   |
last_cleanup_deleted_tags_count  |
delete_started_at                |
status_updated_at                |
failed_deletion_count            | 0
next_delete_attempt_at           |

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Stan Hu

Merge request reports

Loading