Skip to content

Dedupe runner tags script fails if update to a row conflicts with a unique index

For a customer running GitLab 17.8.3, we attempted to run the de-dupe script in $3700665 (comment 2284396203). The customer had upgraded the operating system via a Geo replica, and we were able to detect from SELECT id, name FROM tags WHERE id NOT IN (SELECT MIN(id) FROM tags GROUP BY name); that there were duplicate tags entries.

After a few hours, we got the script to work after some wrangling, but it took us a while to resolve a number of failures:

  1. On the ci_runner_taggings table, the script attempts to run row.update!(tag_id: new_tag_id). However, this fails with a PG::UniqueViolation error because there was an existing row for the given tag_id and runner_id. We worked around this by updating the script to destroy the row in that case:
module Ci
  class TempRunnerTagging < Ci::ApplicationRecord
    self.table_name = :ci_runner_taggings
    self.primary_key = :id
  end
end

tagging_models = [Ci::BuildTag, Ci::TempRunnerTagging, TempTaggings]
tagging_models.each do |tagging_model|
  tagging_model.include EachBatch;

  tagging_model.all.where(tag_id: bad_tags).each_batch(of: 10000) do |batch|
    batch.each do |row|
      new_tag_id = tag_remap.fetch(row.tag_id)

      if tagging_model == Ci::TempRunnerTagging
        row_exists = tagging_model.where(tag_id: new_tag_id, runner_id: row.runner_id).exists?
        if row_exists
          row.destroy!
        else
          row.update!(tag_id: new_tag_id)
        end
      else
        row.update!(tag_id: new_tag_id)
      end
    end
  end
end
  1. You might notice the use of Ci::TempRunnerTagging here. We ran into this Rails validation error, which appears to have been fixed in GitLab 17.9: !181228 (merged)
/opt/gitlab/embedded/lib/ruby/gems/3.2.0/gems/activerecord-7.0.8.7/lib/active_record/validations.rb:80:in `raise_validation_error': Validation failed: Sharding key can't be blank (ActiveRecord::RecordInvalid)

We should update the snippet accordingly.

  1. On the p_ci_builds table, we also ran into unique index constraints:
/opt/gitlab/embedded/lib/ruby/gems/3.2.0/gems/activerecord-7.0.8.7/lib/active_record/connection_adapters/postgresql_adapter.rb:768:in `exec_params': ERROR: duplicate key value violates unique constraint "ci_build_tags_100_tag_id_build_id_partition_id_idx" (PG::UniqueViolation) DETAIL: Key (tag_id, build_id, partition_id)=(261, 2330055, 100) already exists.

We were able to reproduce the problem by doing:

entry = Ci::BuildTag.where(tag_id: 333, build_id: 2330055)
entry.update!(tag_id: 261)

However, we ran a number of SQL queries to confirm the table did NOT have this row with tag_id 261; this looked a case of a corrupted index. After that, we attempted to reindex the partitioned table and the specific index:

REINDEX TABLE gitlab_partitions_dynamic.ci_build_tags_100; 
REINDEX INDEX gitlab_partitions_dynamic.ci_build_tags_100_tag_id_build_id_partition_id_idx; 

I think I might have asked the customer to run REINDEX TABLE p_ci_builds as well. None of the steps worked.

When that failed, we just dropped and recreated the unique index:

DROP INDEX index_p_ci_build_tags_on_tag_id_and_build_id_and_partition_id;
CREATE UNIQUE INDEX index_p_ci_build_tags_on_tag_id_and_build_id_and_partition_id

The first issue is probably the most important to handle. I'm not sure how best to handle the third issue; if a unique index is corrupted, I suppose the script could attempt to verify that the duplicate row actually exists. If not, we probably should recreate the unique index.

/cc: @pedropombeiro

ZD: Federal #11657 (closed)

Edited by Stan Hu