GitHub importer fails to handle rate limits when importing note attachments

Summary

GitHub importer fails to handle rate limit errors when importing note attachments and LFS objects, causing incomplete imports. The NoteAttachmentsImporter and LfsObjectDownloadListService lack retry mechanisms for rate-limited requests, treating them as fatal errors instead of rescheduling the workers.

Example Project

Customer repository: Large repository with extensive PR history and attachments (>10GB)

What is the current bug behavior?

Error Analysis (source):

  • from Gitlab::GithubImport::Attachments::ImportMergeRequestWorker
  • from Gitlab::GithubImport::Attachments::ImportNoteWorker
  • from Gitlab::GithubImport::Stage::ImportLfsObjectsWorker

Technical Root Cause (source):

  • ImportMergeRequestWorker and ImportNoteWorker both invoke Gitlab::GithubImport::Importer::NoteAttachmentsImporter which doesn't handle rate limiting
  • Most API requests to GitHub handle rate limiting via the client code, but attachment downloads don't use the API client

Additional Context (source):

  • GitHub's generic rate limit is approximately 3,000 requests per hour
  • LFS import gets rate limited as a result of note attachments invoking rate limits
  • Need similar handling to GitHub client rate limit implementation

Clarification (source):

  • LFS import is being rate limited as a result of note attachments invoking rate limits
  • This is distinct from other LFS-specific import issues

Proposed Solution (source):

The majority of errors bubble up from NoteAttachmentsImporter, which makes multiple requests to GitHub for each object being imported. We should:

  1. Catch the exception (or better, explicitly check for the 429 status code in the response)
  2. Reschedule the worker when rate limited
  3. Implement similar to how we handle rate limiting in the API client
Edited by Rez