Dedupe runner tags script fails if update to a row conflicts with a unique index
For a customer running GitLab 17.8.3, we attempted to run the de-dupe script in $3700665 (comment 2284396203). The customer had upgraded the operating system via a Geo replica, and we were able to detect from SELECT id, name FROM tags WHERE id NOT IN (SELECT MIN(id) FROM tags GROUP BY name);
that there were duplicate tags
entries.
After a few hours, we got the script to work after some wrangling, but it took us a while to resolve a number of failures:
- On the
ci_runner_taggings
table, the script attempts to runrow.update!(tag_id: new_tag_id)
. However, this fails with aPG::UniqueViolation
error because there was an existing row for the giventag_id
andrunner_id
. We worked around this by updating the script to destroy the row in that case:
module Ci
class TempRunnerTagging < Ci::ApplicationRecord
self.table_name = :ci_runner_taggings
self.primary_key = :id
end
end
tagging_models = [Ci::BuildTag, Ci::TempRunnerTagging, TempTaggings]
tagging_models.each do |tagging_model|
tagging_model.include EachBatch;
tagging_model.all.where(tag_id: bad_tags).each_batch(of: 10000) do |batch|
batch.each do |row|
new_tag_id = tag_remap.fetch(row.tag_id)
if tagging_model == Ci::TempRunnerTagging
row_exists = tagging_model.where(tag_id: new_tag_id, runner_id: row.runner_id).exists?
if row_exists
row.destroy!
else
row.update!(tag_id: new_tag_id)
end
else
row.update!(tag_id: new_tag_id)
end
end
end
end
- You might notice the use of
Ci::TempRunnerTagging
here. We ran into this Rails validation error, which appears to have been fixed in GitLab 17.9: !181228 (merged)
/opt/gitlab/embedded/lib/ruby/gems/3.2.0/gems/activerecord-7.0.8.7/lib/active_record/validations.rb:80:in `raise_validation_error': Validation failed: Sharding key can't be blank (ActiveRecord::RecordInvalid)
We should update the snippet accordingly.
- On the
p_ci_builds
table, we also ran into unique index constraints:
/opt/gitlab/embedded/lib/ruby/gems/3.2.0/gems/activerecord-7.0.8.7/lib/active_record/connection_adapters/postgresql_adapter.rb:768:in `exec_params': ERROR: duplicate key value violates unique constraint "ci_build_tags_100_tag_id_build_id_partition_id_idx" (PG::UniqueViolation) DETAIL: Key (tag_id, build_id, partition_id)=(261, 2330055, 100) already exists.
We were able to reproduce the problem by doing:
entry = Ci::BuildTag.where(tag_id: 333, build_id: 2330055)
entry.update!(tag_id: 261)
However, we ran a number of SQL queries to confirm the table did NOT have this row with tag_id
261; this looked a case of a corrupted index. After that, we attempted to reindex the partitioned table and the specific index:
REINDEX TABLE gitlab_partitions_dynamic.ci_build_tags_100;
REINDEX INDEX gitlab_partitions_dynamic.ci_build_tags_100_tag_id_build_id_partition_id_idx;
I think I might have asked the customer to run REINDEX TABLE p_ci_builds
as well. None of the steps worked.
When that failed, we just dropped and recreated the unique index:
DROP INDEX index_p_ci_build_tags_on_tag_id_and_build_id_and_partition_id;
CREATE UNIQUE INDEX index_p_ci_build_tags_on_tag_id_and_build_id_and_partition_id
The first issue is probably the most important to handle. I'm not sure how best to handle the third issue; if a unique index is corrupted, I suppose the script could attempt to verify that the duplicate row actually exists. If not, we probably should recreate the unique index.
/cc: @pedropombeiro
ZD: Federal #11657 (closed)