Skip to content

Document scaling issue during Gitaly cluster migration

What does this MR do?

Document warning and solution for failed mass storage moves when using Horizontal Pod Autoscaler (HPA) with Sidekiq pods. Mass storage moves can fail silently due to pod scaling during job execution.

Problem: During mass storage moves, the following sequence can cause failures:

  • Multiple storage move jobs are scheduled, creating high initial CPU load
  • HPA scales up Sidekiq pods to handle the load
  • As jobs progress, CPU load decreases (jobs become I/O-bound waiting on storage)
  • HPA scales down pods that are still executing long-running storage move jobs
  • Terminated jobs leave projects in "Temporary read-only" state without retry
  • Subsequent storage moves fail silently for affected projects

Symptoms:

  • Error logs show "Context deadline exceeded" without actual errors
  • Projects remain stuck in read-only state
  • Bulk REST API calls fail silently
  • Individual project moves return read-only state errors

Related issues

Author's checklist

If you are a GitLab team member and only adding documentation, do not add any of the following labels:

  • ~"frontend"
  • ~"backend"
  • ~"type::bug"
  • ~"database"

These labels cause the MR to be added to code verification QA issues.

Reviewer's checklist

Documentation-related MRs should be reviewed by a Technical Writer for a non-blocking review, based on Documentation Guidelines and the Style Guide.

If you aren't sure which tech writer to ask, use roulette or ask in the #docs Slack channel.

  • If the content requires it, ensure the information is reviewed by a subject matter expert.
  • Technical writer review items:
    • Ensure docs metadata is present and up-to-date.
    • Ensure the appropriate labels are added to this MR.
    • Ensure a release milestone is set.
    • If relevant to this MR, ensure content topic type principles are in use, including:
      • The headings should be something you'd do a Google search for. Instead of Default behavior, say something like Default behavior when you close an issue.
      • The headings (other than the page title) should be active. Instead of Configuring GDK, say something like Configure GDK.
      • Any task steps should be written as a numbered list.
      • If the content still needs to be edited for topic types, you can create a follow-up issue with the docs-technical-debt label.
  • Review by assigned maintainer, who can always request/require the reviews above. Maintainer's review can occur before or after a technical writer review.

Merge request reports

Loading