Skip to content

GitLab Next

Why GitLab
Pricing
Contact Sales
Explore

Sign in
Get free trial

Scheduled repository storage move workers should be urgency=throttled

The GitLab API now supports scheduled repository moves - https://docs.gitlab.com/ee/api/project_repository_storage_moves.html#schedule-a-repository-storage-move-for-a-project

However, at present, ProjectUpdateRepositoryStorageWorker runs on the catchall shard: https://dashboards.gitlab.net/d/sidekiq-queue-detail/sidekiq-queue-detail?orgId=1&var-PROMETHEUS_DS=Global&var-environment=gprd&var-stage=main&var-queue=project_update_repository_storage

This means that we need to drip-feed scheduled repository updates to the API, or face overwhelming the system with too many concurrent updates, possibly additionally causing saturation on the catchall fleet along with IO saturation in the Gitaly fleet.

Proposal

Make ProjectUpdateRepositoryStorageWorker run as urgency=throttled and assign a dedicated shard (repository_updates or gitaly_throttled?) to run these jobs. The sidekiq selector for jobs would be feature_category=gitaly&urgency=throttled
Updating the total concurrency of the shard (though the number of pods and sidekiq concurrency) will allow control over the number of concurrent jobs that can be executed.
This sidekiq shard could run in kubernetes (cc @jarv)
CirepoM would continue to work as it does right now, but could be simplified to issue all moves in a single batch, but continue to poll for status updates. Migration jobs would run back to back, the next job starting immediately once the previous one had completed on both failure and success (at present, we need to timeout for failures)

cc for comments @glopezfernandez @nnelson @zj-gitlab @marin @proglottis

Rollout Plan

Prep work

Any time: gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles!92 (merged) "Adds relabeling and log configuration for throttled shards"
Any time: gitlab-com/runbooks!2427 (merged) "Preparatory refactor to allow HPA saturation rules"
Any time: gitlab-com/runbooks!2426 (merged) "Autogenerate kubernetes HPA alerting rules"
Monitoring and logging updates for the shards gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles!92 (merged)
Merge gitlab-org/gitlab!35230 (merged) "Throttle ProjectUpdateRepositoryStorageWorker Jobs"
1. At this point, the jobs will continue to run safely on the catchall fleet.
because we are using maximum replicas for throttling, we will also need to ignore these shards for the hpa alert https://gitlab.com/gitlab-com/runbooks/-/blob/master/rules/kubernetes-hpa.yml#L32-43

Staging

gitlab-com/gl-infra/k8s-workloads/gitlab-com!288 (merged): Add the throttled shards to non-prod environments
~~https://ops.gitlab.net/gitlab-cookbooks/chef-repo/-/merge_requests/3766~~ : Exclude the database and throttled jobs in staging (reverted, see https://ops.gitlab.net/gitlab-cookbooks/chef-repo/-/merge_requests/3797)
Confirm the following:
Queues are now not running in catchall
Pods are created in staging
Queues are running in K8s
Queues are processing jobs and logging

Production

change issue production#2378 (closed)

Edited Jul 07, 2020 by John Jarvis

Assignee

Time tracking