Improve Testing process to prevent heavy use and slowness of CI
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem to solve
For nearly any push to gitlab-ce, we trigger a pipeline. These pipelines contain upwards of 95 total jobs. The test section of these contain the vast majority which all happen in parallel (or at least scheduled that way). This will put an immense amount of stress on our underlying file servers trying to provide the data required to clone for each job. This has also been a contributing factor to gitlab-com/gl-infra/production#553 (closed).
While GitLab.com continues to scale and grow this will further be an issue. Sure we can make improvements to throttling in our backend, but the more we relieve the throttling issue, the larger this issue will come up as then the next bottle neck will be our file servers. At some point we'll reach a point where GitLab specific repos or other customer specific repos will start negatively impacting other customers. This will further become an issue as we grow our team and as we have more contributors volunteering their time with us. This same point applies to customers utilizing GitLab.com and are growing their project in a similar fashion.
We may be able to argue that the GitLab repos are abusing GitLab.com
One could also argue that we could simply expand the fleet of servers as well.
Further details
- gitlab-com/gl-infra/production#553 (closed)
- Internal conversation: https://gitlab.slack.com/archives/C101F3796/p1541631266354500?thread_ts=1541627649.354200&cid=C101F3796
Proposal
- Determine a proactive way to prevent such a large amount of jobs from clogging our systems and thrashing access causing a headache for everyday users of our repos
- Think about the future of GitLab.com
What does success look like, and how can we measure that?
- Jobs that are running don't fail because 200 others are already running and we've throttled ourselves
- Anyone can freely clone the repo even when we have 200 jobs running