Migrate to Hashed Storage
Production Change - Criticality 4 C4
Migrate all Repositories to Hashed storage (aka storage_version: 2) |
https://docs.gitlab.com/ee/administration/raketasks/storage.html |
|---|---|
| Change Type | Repository Storage |
| Services Impacted | Storage |
| Change Team Members | @skarbek |
| Change Severity | C4 |
| Buddy check or tested in staging | Nominate @dsylva && @aamarsanaa |
| Schedule of the change | 2019-01-18 |
| Duration of the change | Unknown |
Overview
- ssh into the console node
- set env variables
ENV_TOandENV_FROMas appropriate - Execute the rake task
gitlab:storage:migrate_to_hashed - validate all is well via
gitlab:storage:legacy_projectsshould output 0
What to Monitor
- HashedStorageWorker: https://dashboards.gitlab.net/d/000000124/sidekiq-workers?orgId=1&refresh=5s&var-worker=ProjectMigrateHashedStorageWorker%23perform&var-database=influxdb-01-inf-gprd
- Sidekiq General: https://dashboards.gitlab.net/d/9GOIu9Siz/sidekiq-stats?orgId=1
- Redis General: https://dashboards.gitlab.net/d/wccEP9Imk/redis?refresh=5m&orgId=1
- PGBouncer Connections: https://dashboards.gitlab.net/d/000000285/pgbouncer-detail?orgId=1&var-environment=gprd&var-fqdn=patroni-04-db-gprd.c.gitlab-production.internal&var-user=gitlab&var-database=gitlabhq_production&var-prometheus=prometheus-01-inf-gprd
-
project_migrate_hashed_storagequeue: https://gitlab.com/admin/background_jobs
Finer Detailed Plan of Action
- During the steps below, if we are impacting production, throttle were appropriate
- adjusting the amount of workers on sidekiq
- batching up fewer jobs
- Set the range between 0 and 1000, monitor the queue, and various dashboards to ensure production is not suffering
- Set the range to 1000 and 4000, repeat
- set the range to 4000 and 16000, repeat
- set the range to 16000 and 256000, repeat
- repeat increasing the range and batch size until no legacy storage items are left
Mitigation
- A dedicated sidekiq node with 4 workers has been spun up to prevent overloading our existing sidekiq fleet: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/4869#note_132212272
- Code changes have been performed to push the
read_onlyflag to as late as possible: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/24128
Tested?
- Staging is currently undergoing this migration
- with batching up 1 million projects at a time, the service appears to be running just fine
- we do not have the above mentioned metrics available for viewing...
What is this
- The paths for which our storage lies for repositories are changing from
.../skarbek/test0=>...<some_hash_value_based_on_our_code> - This is being done as the only support path to support file storage of repos in upcoming editions of GitLab: https://gitlab.com/gitlab-org/gitlab-ee/issues/8690
Something to be careful of
Edited by John Skarbek