automated gitaly storage nodes rebalancing
Problem to solve
Time consuming, laborious and error prone process for rebalancing gitaly storage nodes
If we could add some automation here it would make it much easier for users to expand storage fleets
Intended users
admins of self-managed gitlab instances
infra team at Gitlab
Further details
Proposal
Prerequisites:
- gitaly#2618 (closed) - Cleanup automatically on success
- #222793 (closed) - Cleanup automatically on failure
The way it could work:
- a sidekiq job runs once a day (cron?), it checks disk usage on the storage nodes
- if it identifies nodes with disk usage >70% it will try to find nodes with usage <60%
- if there are no nodes with disk <60% an alert is raised to add more storage to the gitaly fleet
- if there are nodes found with <60%
- identify repos on the overloaded node that can be moved (e.g. criteria: big size, high growth rate)
- schedule project_update_repository_storage sidekiq jobs to move the identified repos to the node which has low disk usage
Permissions and Security
Documentation
Testing
What does success look like, and how can we measure that?
Links / references
Edited by James Fargher