POC - GitHub integration with GitLab CI without mirroring

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Objective

Determine the technical approach for integrating GitHub repositories with GitLab CI without relying on the current mirroring process. This proof of concept (POC) aims to explore more efficient and scalable methods of running GitLab CI pipelines for GitHub-hosted repositories.

Background

Currently, GitLab uses a mirroring process to integrate with GitHub repositories, which has led to performance issues and delays in CI pipeline execution. This POC seeks to address these challenges by exploring alternative approaches that can provide near real-time CI pipeline triggering for GitHub repositories.

Proposed Approaches

We will investigate the following ideas:

  1. Direct Commit Fetching (Idea A):
    • Upon receiving a webhook from GitHub, GitLab CI directly fetches the single commit.
    • Add jobs/pipelines to the CI queue that clones directly from the GitHub repo.
    • The existing mirroring job could run in parallel for full repository synchronization.
      • Option 1: With mirroring and webhooks
        • Source Code mirrored to GitLab and CI triggered on every push
      • Option 2: No mirroring and webhooks
        • Pull the branch only
        • Only trigger on webhooks
    • Conclusion: We decided not to explore this idea further as it would not have met the requirement of no source code on GitLab.
  2. Efficient Mirroring (Idea B):
    • Redesign/refactor the mirroring process to be more sophisticated and efficient.
    • Aim for near real-time mirroring with minimal delay (e.g., 1 minute or less).
    • Consider the impact on Sidekiq and Gitaly's performance.
  3. Pure CI from External Source (Idea C):
    • Implement CI that reads directly from the external GitHub repository.
      1. The gitlab-ci.yml is hosted on Gitlab.com - no mirroring or copying will be done from Github
      2. The runners will be fetching from Github directly

Key Considerations

  • Performance and scalability of each approach
  • Impact on GitLab.com infrastructure
  • User experience and feedback delay for developers
  • Security implications, especially regarding CI job permissions

Success Criteria

  • Trigger a build on GitLab CI when a change is pushed to GitHub SCM instantly.
  • Maintain or improve system stability compared to the current mirroring approach.
  • Provide a seamless experience for users integrating GitHub repositories with GitLab CI.

Scope

  • This feature should be available only for GitLab Ultimate groups and projects both on .com and self-managed
  • The initial POC will focus on GitHub Enterprise integration, requiring expansion to GitHub.com in the future.
  • Manual mirror refreshes will continue to have the existing mirroring interval delay (5 min) for all Tiers

Next Steps

  1. Develop POCs for Idea C.
  2. Evaluate the technical feasibility, performance, and scalability of each POC.
  3. Gather feedback from the scalability team, Dedicated team and other Verify engineers.
  4. Consider load testing before shipping any solution.
  5. Investigate addressing the security constraints
  6. Refine the approach based on findings and feedback.

For additional context and discussions, please refer to this comment.

Edited by 🤖 GitLab Bot 🤖