automated gitaly storage nodes rebalancing

Problem to solve

Time consuming, laborious and error prone process for rebalancing gitaly storage nodes

If we could add some automation here it would make it much easier for users to expand storage fleets

Intended users

admins of self-managed gitlab instances

infra team at Gitlab

Further details

Proposal

A prerequisite for this would be improvement on this issue: https://gitlab.com/gitlab-org/gitlab-ce/issues/63580

The way it could work:

  • a sidekiq job runs once a day (cron?), it checks disk usage on the storage nodes
  • if it identifies nodes with disk usage >70% it will try to find nodes with usage <60%
  • if there are no nodes with disk <60% an alert is raised to add more storage to the gitaly fleet
  • if there are nodes found with <60%
    • identify repos on the overloaded node that can be moved (e.g. criteria: big size, high growth rate)
    • schedule project_update_repository_storage sidekiq jobs to move the identified repos to the node which has low disk usage

Permissions and Security

Documentation

Testing

What does success look like, and how can we measure that?

Links / references

/cc @gitlab-com/gl-infra

Assignee Loading
Time tracking Loading