GitHub importer fails to handle rate limits when importing note attachments
Summary
GitHub importer fails to handle rate limit errors when importing note attachments and LFS objects, causing incomplete imports. The NoteAttachmentsImporter and LfsObjectDownloadListService lack retry mechanisms for rate-limited requests, treating them as fatal errors instead of rescheduling the workers.
Example Project
Customer repository: Large repository with extensive PR history and attachments (>10GB)
What is the current bug behavior?
Error Analysis (source):
- from
Gitlab::GithubImport::Attachments::ImportMergeRequestWorker - from
Gitlab::GithubImport::Attachments::ImportNoteWorker - from
Gitlab::GithubImport::Stage::ImportLfsObjectsWorker
Technical Root Cause (source):
-
ImportMergeRequestWorkerandImportNoteWorkerboth invokeGitlab::GithubImport::Importer::NoteAttachmentsImporterwhich doesn't handle rate limiting - Most API requests to GitHub handle rate limiting via the client code, but attachment downloads don't use the API client
Additional Context (source):
- GitHub's generic rate limit is approximately 3,000 requests per hour
- LFS import gets rate limited as a result of note attachments invoking rate limits
- Need similar handling to GitHub client rate limit implementation
Clarification (source):
- LFS import is being rate limited as a result of note attachments invoking rate limits
- This is distinct from other LFS-specific import issues
Proposed Solution (source):
The majority of errors bubble up from NoteAttachmentsImporter, which makes multiple requests to GitHub for each object being imported. We should:
- Catch the exception (or better, explicitly check for the 429 status code in the response)
- Reschedule the worker when rate limited
- Implement similar to how we handle rate limiting in the API client
Edited by Rez