Document subtle details of request_concurrency, concurrency and limit

Context: https://gitlab.slack.com/archives/C0199KBMY59/p1675684027705959

Toshi: Hi team. Could anyone provide me a good explanation of request_concurrency? I don’t see a big difference if the value is 1 or 10. concurrency and limit limit the number of jobs running simultaneously in the end. https://stackoverflow.com/questions/54534387/how-gitlab-runner-concurrency-works https://www.howtogeek.com/devops/how-to-manage-gitlab-runner-concurrency-for-parallel-ci-jobs/ (edited)

tmaczukin: This is a really subtle setting 🙂

tmaczukin: You need to understand that a lot of things in the Runner process are happening concurrently. Go simplifies multi-thread writing a lot, and we take a huge benefit of that.

tmaczukin: concurrent and limit work as a sort of "slots" or "buckets".

tmaczukin: When Runner process is started it creates internal workers (where every worker is a dedicated thread, in Go named goroutine). Each of these threads is able to request a job from GitLab and if received - handle it's execution.

tmaczukin: So, if you have concurrent = 10, you will have 10 internal workers that can handle jobs.

tmaczukin: Runner's main loop will iterate over the list of registered runners (represented by individual [[runners]] entries in the configuration file). Each runner will be "feeded" to the workers, but only if there is a free worker that doesn't handle a job at that time.

tmaczukin: This covers the concurrent setting, which defines global concurrency of one runner process. Runner will never handle more jobs than the concurrent setting - no matter how many runners are registered in the config.toml nor what is the sum of their limit settings.

tmaczukin: limit is a limiting setting for an individual [[runners]] entry. A runner will be feeded to the handling worker only if it doesn't exceed it's own limit already.

tmaczukin: All of that is handled concurrently in different threads.

tmaczukin: A third thing to cover here is requesting GitLab - and this is what request_concurrency is about.

tmaczukin: Things work like this: concurrent worker is freed, so it gets a [[runners]] object to try handling job for it Request to GitLab (depending on [[runners]]'s URL and token) is started. GitLab uses a long polling mechanism and that request may blocked by GitLab up to 30 seconds (by default) until it will either find and return a job for that runner or return 204 no content meaning there are no jobs to handle So at this time we have a concurrent worker that executed an HTTP request against GitLab and awaits it response. Let's say we have a high concurrent value and not much jobs that are currently being handled. So another concurrent worker get's a [[runners]] object and asks GitLab for a job. This request is also blocked and awaits response from GitLab. Now, if you have no jobs for a given Runner but you have concurrent = 1000, you can quickly generate 1000 HTTP requests that are hanging between Runner and GitLab, blocked and awaiting a response. And this may cause problems.

tmaczukin: This is where request_concurrency comes into play.

tmaczukin: By request_concurrency = 10 you say that for a given [[runners]] entry you allow for no more than 10 concurrent HTTP requests to GitLab. So even if your concurrent and limit are way bigger than 10 and you're far away from that limit being reached by already handled jobs, Runner will generate no mor than 10 requests.

tmaczukin: But when a response will be received, the request_concurrency limiting will be immediately signalled, so that another concurrent worker will be able to handle another request, while reading, parsing and executing the already received job will be handled in the previous worker in the background, but will not block the request_concurrency queue anymore.>

tmaczukin: What value this setting should have highly depends on your configuration (Runner's and GitLab's), the number of handled jobs, Runner's capacity and speed of new jobs execution, the distribution of jobs etc. So there is no magical number that we can suggest for it 🙂

tmaczukin: Instead, you should configure Prometheus to track Runner's metrics, observe them and make configuration decisions based on your setup, what you see in the metrics and the goal you have in your mind 🙂

Joe: @tmaczukin this is an awesome explanation! if this isn’t capture in the handbook / docs somewhere, we definitely should. thanks!

Toshi: Thank you @tmaczukin for the detailed explanation. So, can I assume the value of request_concurrency has a kind of effect to buffer the HTTP request from the Runners to the GitLab server? A small value of request_concurrency helps GitLab server to steadily distribute the jobs to Runners but it might slow down the lead time of job creation. However, it doesn’t affect the maximum number of concurrent jobs. What is the job status in UI for the job requests that exceed the value of request_concurrency? Is it the orange “Pending”? Or the previous status with the gray icon?

tmaczukin: A small value of request_concurrency helps GitLab server to steadily distribute the jobs to Runners but it might slow down the lead time of job creation. However, it doesn’t affect the maximum number of concurrent jobs. Not entirely. This limit is in fact activated when you don't have jobs matching a runner. Let's say that some runner is configured to run multiple concurrent jobs so executes multiple concurrent API requests to GitLab to get them, but there are no jobs matching that runner. The API calls will hang here (for the long polling period configured on GitLab Workhorse) and if the number of already started connections will exceed request_concurrency, Runner will start limiting them. But if the same runner has tens or hundreds or thousands of jobs awaiting it in the queue - the request will be sent and it will get a response almost immediately. So you will either hit the concurrency limit first or drain the pending queue before request_concurrency will start be used.

tmaczukin: Think of this setting as a guard gateway to not hammer GitLab API endpoint too much in case when the given runner doesn't have anything to do.

tmaczukin: Otherwise, effect on getting pending jobs from the queue will be limited.

tmaczukin: As I said in my first message - this is a very subtle setting. You use it to limit requests sent to GitLab (so to reduce network stress on GitLab side) but need to set it to a value that doesn't slow handling your queues too much. This is why we have metrics for that. This allows you to find what's the best value in your case 🙂

Toshi: Let’s say that some runner is configured to run multiple concurrent jobs so executes multiple concurrent API requests to GitLab to get them, but there are no jobs matching that runner. The API calls will hang here (for the long polling period configured on GitLab Workhorse) and if the number of already started connections will exceed request_concurrency, Runner will start limiting them. What is the situation when jobs don’t match runners? Do you mean the jobs don’t match the runner by CI/CD tag?

tmaczukin: By tag, by type. There are several factors that are considered by GitLab when deciding what job should be sent to a Runner.

tmaczukin: When Runner process is started it constantly asks for new jobs. No matter whether there are jobs for that runner or not. It asks and awaits either a job payload or a response that there are no jobs for now.

tmaczukin: When GitLab receives such requests, it checks several things: What kind of runner it is. As instance runner may be used by any project, a group runner may be used only by projects under that group and project runner may be used only by directly connected projects. Basing on that an initial list of jobs is prepared. For instance runners - next step is to filter out jobs from projects that exceed the pipeline minutes quotas and to enforce our fair scheduling algorithm Next GitLab checks whether the Runner is marked as protected = true - if yes, jobs from non-protected references are removed from the list Next GitLab checks whether the runner is marked as runUntagged = true. If not - all jobs that don't have a tag are removed from the list. Finally GitLab matches tags connected to a job with tags set on the runner. If Runner doesn't have all tags that are specified by the job, such job is filtered out. After that we get a list of jobs that can be assigned to the runner that requested for a job. That list may have multiple entries or may be empty. And this is what I've called as "whether there are matching jobs".

tmaczukin: If Runner asks for a job, GitLab does all of that stuff to find jobs applicable for that runner and there is at least one job on the list - it's sent to the runner. This immediately closes the connection and depending on Runner configuration and already handled load, it will either open a new one with a new request or will wait to restore some capacity before it will do that. But if the list is empty - nothing is sent back and the connection hangs awaiting either a job that will land into the queue (and will be sent back) or for a timeout after which GitLab's API responds with 204 No Content meaning that there are no jobs for now. Consider the example: you have a project which has instance and group runners disabled and uses only your project runner. This project runner is handling only jobs from this project - nothing else. If you don't start any pipeline - there are no jobs in the queue for that runner. It will be asking for jobs over and over again, but the connections will hang and eventually will be responded with 204 No Content. But if you would have tens of pipelines with hundreds of jobs within it, and you would have only this one runner to handle them all, then until the runner will run out of capacity, every request would be responded almost immediately and the only delay would be caused by networking distance and time consumed by GitLab Rails and the database to handle the API call.

Toshi: But if the list is empty - nothing is sent back and the connection hangs awaiting either a job that will land into the queue (and will be sent back) or for a timeout after which GitLab’s API responds with 204 No Content meaning that there are no jobs for now. Consider the example: you have a project which has instance and group runners disabled and uses only your project runner. This project runner is handling only jobs from this project - nothing else. If you don’t start any pipeline - there are no jobs in the queue for that runner. It will be asking for jobs over and over again, but the connections will hang and eventually will be responded with 204 No Content. Ah, now, I understand the situation when a Runner is awaiting a job until its timeout. And the situation when a Runner has 2 or more requests to a GitLab project at the same time is really just for an instant, right? Because of the following explanation you provided above. But when a response will be received, the request_concurrency limiting will be immediately signalled, so that another concurrent worker will be able to handle another request, while reading, parsing and executing the already received job will be handled in the previous worker in the background, but will not block the request_concurrency queue anymore