Spike: Migration - File based approach
The goal of this solution is to wrap the current project (or group) export/import in a better user experience by providing an experience similar to GitHub/Bitbucket importers, but still use file-based export and import by orchestrating the export, intermediate storage, and import. Then, iterate to replace some of the cumbersome aspects of exporting, storing, and importing files with more elegant and robust solutions.
This is a spike to prove out the technical feasibility of the above-described approach and answer some unknowns so that we can create an iterative path toward the solution.
The spike will be timeboxed - up to 5 days.
Details from #227279 (comment 375683638):
File based approach
- For export, Add new bulk export API, that accepts a list of group ids that triggers exports that group, including subgroups and projects.
- Add status endpoint of bulk export that keeps track of export process for each group/project.
- For import, add bulk import API, that accepts Source instances credentials (we can figure out how to store that later) and group ids on Source. Invokes export API on Source and have some sort of polling mechanism that checks bulk export status.
- If export status reports groups export successful (without project) - download group export and import it via the usual route.
- Same for projects, poll endpoint - download project archive - import.
- Repeat until all projects are done
- Pr Fail if timeout exceeded
Immediate concerns:
- No matter the approach we take we need to keep scalability in mind and test with extreme data sets. 1000s of groups and projects. I have seen 45Gb project exports that were constantly failing on sidekiq and were only successful via a manual infra export that took 4 hours. And this is just 1 project. This is not a concern for a spike, but for the overall approach.
- Bulk Export probably needs to be throttled to not use up too many resources.
- If we download file from Source directly in sidekiq, does that skip workhorse? Can we prevent that and do workhorse download? This is not good because downloading 45Gb archive occupies sidekiq thread that could have been used to do something else that is more useful. This applies for any solution that involves direct automatic file download.
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.