PoC job concurrency limit for explorative purpose
What does this MR do and why?
Related to https://gitlab.com/gitlab-org/gitlab/-/issues/352802
This is a naive PoC to implement job concurrency limit and it's submitted for explorative purpose. It took me 1h of explorative work and wanted to share the idea.
- The limit check is not concurrent safe. If multiple jobs are submitted at the same time there is a risk that many of these jobs get to the runner queue.
- It contains some inefficiencies, like scheduling the pipeline processing later in the future if the whole pipeline is stuck waiting for the concurrency to have capacity.
- It uses SQL to count running jobs.
- We keep the builds in a
waiting
state outside the runner queue. Builds don't get topending
state until there is capacity in their concurrency quota. - The SQL query for the fair scheduling remains untouched.
- The SQL query could possibly keep performing very well at scale. Since only a limited number of builds will make it to the queue, the query will have less builds to query.
flowchart LR
created --> capacity{full capacity?}
capacity -- Y --> waiting
capacity -- N --> pending
waiting --> capacity
pending --> running
Again, the purpose of this MR is purely to explore the solution space, understand the complexity involved and where there can be performance bottlenecks.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.