Backfill traversal_ids on top 10 .com groups
Production Change
Change Summary
There was a QA failure in staging yesterday that caused us to temporarily disable the sync_traversal_ids
feature flag on .com. It's possible the data in the column has fallen out of sync. We would like to run the script below to ensure the column for these groups are back in sync.
The code is a duplicate of the change request successfully completed two days ago.
Change Details
- Services Impacted - PostgreSQL
- Change Technician - @alexpooley (APAC), @ifarkas (EMEA)
- Change Criticality - C3
- Change Type - changeunscheduled
- Change Reviewer - @ggillies
- Due Date - 2020-05-20 05:00
- Time tracking - 5 minutes
- Downtime Component - None.
Detailed steps for the change
Pre-Change Steps - steps to be completed before execution of the change
None.
Change Steps - steps to take to execute the change
Estimated Time to Complete (mins) - 5 minutes.
-
Execute the following Ruby script to synchronize namespaces.traversal_ids
values.
include Gitlab::Database::MigrationHelpers
group_ids = [2274533, 6547531, 6543, 9970, 4909902, 7819554, 6753833, 4258049, 5892345, 3924854, 4249178]
group_ids.each do |group_id|
with_lock_retries do
namespace = Namespace.find(group_id)
Namespace::TraversalHierarchy.for_namespace(namespace).sync_traversal_ids!
end
end
Post-Change Steps - steps to take to verify the change
Estimated Time to Complete (mins) - 1 minute.
-
Execute the following Ruby script to determine remaining incorrect namespaces.traversal_ids
values.
group_ids = [2274533, 6547531, 6543, 9970, 4909902, 7819554, 6753833, 4258049, 5892345, 3924854, 4249178]
incorrect = group_ids.map do |group_id|
namespace = Namespace.find(group_id)
Namespace::TraversalHierarchy.for_namespace(namespace).incorrect_traversal_ids.all
end
if incorrect.flatten.compact.empty?
puts '*** Success ***'
else
puts '!!! Failure !!!'
end
*** Success ***
printed on success, or !!! Failure !!!
printed on failure.
Rollback
Rollback steps - steps to be taken in the event of a need to rollback this change
A rollback would not be appropriate upon failure as the column is currently behind a feature flag and not used. We could archive old values and restore if failure occurs, but this is unnecessary. However, we will need to fix whatever issue occurs and try this migration again.
Monitoring
Key metrics to observe
Not applicable?
Summary of infrastructure changes
- [-] Does this change introduce new compute instances?
- [-] Does this change re-size any existing compute instances?
- [-] Does this change introduce any additional usage of tooling like Elastic Search, CDNs, Cloudflare, etc?
Summary of the above
Changes checklist
-
This issue has a criticality label (e.g. C1, C2, C3, C4) and a change-type label (e.g. changeunscheduled, changescheduled) based on the Change Management Criticalities. -
This issue has the change technician as the assignee. -
Pre-Change, Change, Post-Change, and Rollback steps and have been filled out and reviewed. -
Necessary approvals have been completed based on the Change Management Workflow. -
Change has been tested in staging and results noted in a comment on this issue. -
A dry-run has been conducted and results noted in a comment on this issue. -
SRE on-call has been informed prior to change being rolled out. (In #production channel, mention @sre-oncall
and this issue and await their acknowledgement.) -
There are currently no active incidents.