Skip to content

Reduce performance impact to users when importing projects

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Problem to solve

When importing projects using direct transfer, large complex projects can often generate a large volume of sidekiq jobs. The number of jobs is unconstrained and can lead to pathological behaviour on the instance. This can at times put excessive load on the database and other subsystems resulting in a degrade in the experience for users of the GitLab instance. At times even causing complete outages.

This internal issue describes the recent challenges we have run into on GitLab Dedicated.

Once of the enhancements we have made is to add sidekiq pods specifically for processing this traffic. However, this isn't a long term solution as it creates overhead for standing up, managing and tearing down this infrastructure during migrations. GitLab has to absorb the cost of the additional infrastucture. We need to establish better guardrails around the spawning of unconstrained jobs to better protect other services running on the platform.

Impacted platforms

  • GitLab.com
  • GitLab Dedicated
  • GitLab Dedicated for Government
  • Self-managed GitLab

Proposal

Throttle the number of active sidekiq jobs that a single import/DT job can can generate. The throttling should be an configurable. The import/DT job must not fail but gracefully manage and pace out the spawning of job sidekiq jobs whilst remaining within the throttle limit.

It is anticipated that this will slow down some import/DT jobs but it helps protect the availability of services and the user experience on our platforms.

Permissions to the throttling configuration:

  • GitLab.com and GitLab Dedicated - must only be permitted for GitLab.
  • Self-managed GitLab - Instance administrators.

Review the experience for long running DT/import jobs to ensure users have an informed and predictable experience. For example, ensuring that they are able to readily see the project of the job. They can be informed asynchronously when the job succeeds or fails for long running jobs.

If it's possible to evaluate how long a job might take, we should provide an estimate to the user.

Rollout considerations

  1. We will need to define the limits for each of the above platforms.
  2. Evaluate the impact to existing customers of the limit.
  3. Communicate the change in advance to customers.

What does success look like, and how can we measure that?

  • We should realise an infrastructure cost reduction by not having to provision additional sidekiq shards on GitLab Dedicated.
Edited by Sampath Ranasinghe