Maybe pass only the nodeSelector config to the runner with an env like:
RUNNER_NODESELECTOR = "gitlab-runner=true"
that the runner could use to create the pods with:
Is it possible to get this done in the nearest future? Or are there any plans to support this? It will help us a lot since we want to place runners in a separate node pool.
If you are serious about kubernetes for running build jobs, please provide the ability to configure the build pod's tolerations. They have been present and stable in the kubernetes API for well over a year.
Without them, non-CI processes end up on what are supposed to be CI-specific nodes.
I can set tolerations for my gitlab-runner deployment, but I have no way to similarly configure the spawned build-pods. I can use tolerations but then the job pods get scattered to the non-CI nodes in the cluster. If I try to use node-selectors, non-CI jobs get scheduled on the CI nodes.
The idea of setting "not-a-ci-node" node labels on all non-CI nodes then retro-fitting/setting nodeSelectors for every deployment that isn't gitlab-runner is not a solution.
I couldn't agree more with @stephen6. I'd like to taint nodes dedicated for CI jobs, and need way to add tolerations to gitlab-runner and runners pods. Node selector is weak/bad way of picking nodes in busy kubernetes cluster with 100+ apps.
@stephen6 hits the nail on the head. Our use case is that we would like to run a small pool of low-resource instances to run our cluster-components - prometheus, grafana, cluster-autoscaler, cert-manager, etc - and a second pool of larger instances for running all of our generated job pods that can scale down to 0 instances at night when no jobs are running, and up during the day.
Our choices are to either
Taint the nodes of the smaller pool and retrofit EVERY one of our cluster components with a node selector and tolerations to force them into this pool OR
Taint the nodes of the larger pool and add tolerations and a node selector to just the generated job runner pods
I hope it's plain that the latter of these two choices is a SUBSTANTIALLY simpler solution to the problem. We would REALLY appreciate this feature.
I've written up a quick proposal to fix this issue over in #3969 (closed) I did not mean to create a duplicate, but unfortunately i did not find this issue while searching for some reason. Probably because of missing labels here as well.
This currently isn't planned to be worked on. If you're interested in this feature please upvote it - our Product team uses the upvotes as an indication of a feature's popularity.
@paulcatinean Not at all what this ticket is about. That setting affects the affinity of the gitlab-runner pod. There is currently no way to configure the affinity for the per-job runner pod.
The details are well defined and justified above for those so inclined.
@stephen6 I recently joined as the new PDM for the Runners. I will be taking a closer look at this issue as we plan out our priorities for the next few releases.
The issue description is kind of vague on what is the actual target. I've read it again and now I'm not sure whether this is about the gitlab-runner pod or the per-job pods.
In the end I would go with the latter since according to the source, the runners: section is about the per-job pods. So I believe the solution provided above is valid (delta the affinity configuration).
That being said, I'd love to see tolerations and affinities supported through the runners: section as the issue suggests.
@kravvcu thank you for the solution, it did work for me to add tolerations to the runner pods (those that to the actual builds).
@DarrenEastman
Though I'd love to see this natively supported by the Helm Chart, putting this working example somewhere easy to find would go a long way already.
And in the same deployment a env var for gitlab-ci runner container:
- name: KUBERNETES_NODE_SELECTOR value: 'node:ci'
Now the runner pod runs on a cheap node 24/7 and the per-job pods spin up on expensive CI nodes that downscale to zero.
We also tested with env var KUBERNETES_NODE_TOLERATIONS and the tolerations are added correctly to per-job pods started on nodes with corresponding taints.
@kravvcu Yeah. There is not an official name given to the per-job pod--made even less clear since the (k8s)Job object was added. It has always been about the per-job pods. The runner itself is delightfully lightweight, so while being able to direct its placement is proper, whether or not one does so is more stylistic than critical. A heavy-weight test/build job running on the wrong host can be detrimental to any non-gitlab-runner workloads.
This feature proposal does tie into the broader Kubernetes theme that we going to be focused on this year. Adding the backlog label for now as we probably won't be able to get to this on our end in the next couple of release milestones.
I after my MR !2324 (merged) has been merged last week and the upcoming release 13.4 you will be able to set kubernetes node affinities settings via config.toml (which you can even reach from the helm chart with gitlab-org/charts/gitlab-runner!253 (closed)). Pod Affinity and Anti-Affinity is on its way with !2368 (closed). So this Issue could be closed, couldn't it?
Basically came here to look how to set podAntiAffinity on the per job runner pod.
We have enough nodes, but when two specially CPU heavy jobs end on the same node, the node might become unresponsive.
Soft antiAffinity will solve this nicely.