Skip to content

Adjust delay used to spread jobs in GitHub Import

What does this MR do and why?

The improved_spread_parallel_import method introduced in !109264 (merged) to change how GitHub Import's jobs are spread ended up making GitHub Import a little slower as all jobs would be enqueued with at least 1-minute delay between each stage. So since 8 stages are impacted by this delay, in general, GitHub Import would take 8 minutes longer to migrate a project.

This change fixes this problem by making the initial delay start in 1 second.

Fixes: #391230 (closed)

MR that introduced the method: !109264 (merged)

Screenshots or screen recordings

How to set up and validate locally

Because most jobs are spread in batches of 1000, the delay is only applied after reading 1000 records from GitHub. So to test, reduce the batch size to a lower number, for example, 10. This way, for every ten jobs enqueued, a delay of 1 minute will be added.

  1. Enable GitHub Import in the settings (Admin -> Settings -> General -> Visibility and access controls -> Enable GitHub)

  2. Trigger an import via API or UI

curl --location --request POST 'http://gdk.test:3000/api/v4/import/github' \
--header 'Authorization: Bearer <GITLAB ACCESS TOKEN>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "personal_access_token": "<GITHUB ACCESS TOKEN>",
    "repo_id": "238972",
    "target_namespace": "root",
    "new_name": "rspec-core",
    "optional_stages": {
        "single_endpoint_issue_events_import": true,
        "single_endpoint_notes_import": true,
        "attachments_import": false
    }
}'
  1. Check the delay added to the Sidekiq Jobs using Sidekiq Dashboard

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Rodrigo Tomonari

Merge request reports