[Rollout task] Sync CI minutes purchases
Summary
This issue is to handle the rollout/data sync between CustomersDot and GitLab for existing CI minute purchases via a rake task.
Owners
- Team: grouputilization
- Most appropriate Slack channel to reach out to:
#g_utilization
- Best individual to reach out to:
@vij
Stakeholders
-
@amandarueda
(PM) - The support team
The Rollout Plan
- Staging dry-run
- Production rollout
Initial rake task run:
cd /home/gitlab-customers/customers-gitlab-com
sudo -u gitlab-customers RAILS_ENV=production bundle exec rake data_maintenance:sync_ci_minutes_to_gl
Continue rake task (if incident)
cd /home/gitlab-customers/customers-gitlab-com
sudo -u gitlab-customers RAILS_ENV=production bundle exec rake data_maintenance:sync_ci_minutes_to_gl[<id>]
Where <id>
is the Order
ID you wish to continue from.
Please note:
Tmux should be used to run this task as we're expecting it to take a number of hours (~5), for more info on how to use it please check the docs.
Expectations
What are we expecting to happen?
Ci::Minutes::AdditionalPack
records created on GitLab.com for each minutes pack purchased on CustomersDot.
What might happen if this goes wrong?
- Increased error rate on GitLab.com for API requests to the create endpoint
- Increased error rate on CustomersDot for
Gitlab::SyncMinutesJob
processes - Partial data sync on GitLab.com
If something goes wrong, we can kill the rake task process to prevent further jobs being queued.
Data integrity issues may result (depending on the problem), but this will not impact our customers as it's not currently used. Re-sync can occur (idempotent creation is supported) to rectify problems, or a data-cleanup can take place (removing all Ci::Minutes::AdditionalPack
records) without consequence.
What can we monitor to detect problems with this?
Staging
Production
Rollout Steps
Rollout on staging:
-
Calculate expected number of packs that should be created (estimate from number of orders) -
Identify orders for verification on GitLab once task is complete -
Run the rake task as above -
Verify expectations on GitLab -
Verify output in temporary log file
Preparation before production rollout
-
Ensure that you or a representative in Fulfillment can be available for at least 2 hours after the task has begun (ideally for the full duration of the task) If a different developer will be covering, or an exception is needed, please inform the Fulfillment team in #s_fulfillment_engineering -
Identify any potential conflicting long running / data intensive tasks (Cron jobs etc) -
Arrange a time to begin the task with @sre-oncall
-
Announce on this issue the estimated time this will begin on production
Rollout on production
-
Calculate expected number of packs that should be created (estimate from number of orders) -
Identify orders for verification on GitLab once task is complete -
Notify @sre-oncall
that the task is about to begin -
Run the rake task as above -
Verify expectations on GitLab.com -
Verify output in temporary log file
Rollback Steps
-
This rollout can be cancelled by killing the rake task process from the tmux session (process ID will be printed at the start of the rake task)