Provide documentation on preferred method of migrating Gitaly servers
Searching through the documentation on https://docs.gitlab.com/, I was unable to find any appropriate documentation on how we officially recommend this type of migration be done. There is some documentation on using rsync: https://docs.gitlab.com/ee/administration/operations/moving_repositories.html.
However, I feel this approach, of using rsync, is outdated compared to the tooling the product now provides through the Project repository storage moves API. Most importantly, the old rsync-based documentation provides no easy approach to validating that the new repository is equivalent to the source.
GitLab.com's infrastructure team are well versed in moving repositories around (@nnelson is our resident expert on this) and, with the help of the Gitaly team (thanks to @zj-gitlab and @derekferguson's prioritisation), this tooling has largely been integrated into the product now. Instead of building tools outside the product, the tooling that we need is now part of the product.
This has all the virtues of dogfooding: the migration process is well built, well tested, used by the infrastructure team and migrations can be done with almost no downtime - only a brief interruption per repository. Importantly, the repositories are fully validated before-and-after to ensure that corruption cannot occur.
With this in mind, we should consider updating the repository move documentation and provide an official guide to Gitaly server migration process, using these tools.
Proposed "Official" Gitaly Migration Process
I propose the following approach, which is very similar to how the Infrastructure team for GitLab.com are handling repository moves.
- Setup new repository servers as Gitaly servers and add them as new shards via the GitLab Admin UI.
- Configure repository weights of new target servers to 100 and source servers to 0. This will ensure that all new repositories will arrive on target servers.
- See screenshot below
- See screenshot below
- (Optional) Standup a dedicated Sidekiq node for handling
project_update_repository_storagejobs. Configure the node for a maximum concurrency of (say) 4. This is means of throttling the shard migration jobs so that they don't overwhelm the server.
- Provide a small script for using the new automatic shard selection feature of the repository move API to migrate repositories from the source servers to the target.
- The script will need to iterate over all projects and issue the repository moves via the API
- If the
project_update_repository_storagesidekiq jobs are throttled, the API moves can be issued all at once, and they will be dequeued sequentially and with optimal efficiency. If the jobs are not throttled, care needs to be taken to issue these moves at a pace that the server is able to keep up with (as a starting point, 10 moves per minute could be issued and this could be tuned up or down)
- The automatic shard selection feature was proposed here: gitaly#3209 (closed) and is documented in !45627 (merged)
- GitLab will perform all validation and will guarantee that the repository is equivalent before and after the move.
- If validation fails for some reason, the move is not completed and the repository remains on the source server.
- Each repository will be offline for a very brief period during the move.