Real-time external GitLab CI for repositories hosted on GitHub
Problem to solve
GitLab CI is a market leading CI solution, and people who use GitHub for source code management and code review want to use GitLab for CI. The present work around using mirroring introduces delays that make CI less than real-time, making it slower for developers to get feedback.
Further details
CI/CD for external projects was built in a single release #4839 (closed) with a high degree of urgency so that there was a work around for Gemnasium customers. Mirroring was selected as the only pathway viable in a single release to hit the 10.6 milestone and required collaboration from multiple teams. It was not selected on the basis of being a good foundation for external CI/CD with low latency and scalability.
Mirroring introduces steps into a CI/CD process that are not common in other systems:
- GitLab must be notified by the external system of the change by a web hook (a seconds)
- GitLab must queue the mirror job (mirroring isn't a targeted single ref update, but fetching all differences to all refs - this makes the job more expensive computationally than receiving a simple Git push)
- GitLab must rate limit the mirror queue and apply limits because mirroring can be used to generate large amounts of write traffic very easily (one source repo could be mirrored to 1,000 projects on GitLab.com, meaning 1 write to GitHub, trigger 1,000 mirror operations)
In a typical CI system, none of these are concerns because the CI runner simply clones directly from the source repo when the push is received.
Proposal
If real-time external GitLab CI is important (problem validation), investigation needs to be conducted to evaluate the correct technical approach (it shouldn't be assumed that mirroring is the correct approach given the feature was never designed to solve this problem). Possible approaches:
- Idea A: GitLab CI directly fetches from external repository source - upon receiving a web hook from GitHub, GitLab CI adds jobs/pipelines to the CI queue that clone directly from the GitHub repo. In parallel the mirroring job could run as usual.
- Idea B: Redesign/refactor mirroring to be more sophisticated and efficient so as to handle real-time mirroring. Note, this will always be slower because there are many more operations that need to happen before a CI job start cloning. Is a 1 minute delay acceptable? What is an acceptable replication delay? How much extra pressure does this put on Sidekiq and Gitaly?
- Idea C: Directly fetch from external repository source without the mirroring job running at all, pure CI.