Skip to content

Kubernetes cgroups delegation fails if cgroups are mounted with nsdelegate

Context

When the cgroups filesystem is mounted with nsdelegate, additional permissions checks are enforced when PIDs are moved between different hierarchies. I think this applies even when we use the clone3 syscall to “directly” start a process in a specific cgroup.

The container-optimised OS used by GKE for its nodes don't appear to mount cgroups with this option, which is why our hack to delegate cgroup permissions to the git user in the Gitaly pod works as intended. When I tried to replicate this hack locally in KinD with a small test program and some scripts, I observed an no such file or directory (ENOENT) error as described in the documentation.

Proposal

If we are to continue using the cgroups delegation workaround, we'll need to:

  • validate that nsdelegate is in fact the culprit of the permissions issues
  • survey every cloud provider we support with the Helm chart and on Dedicated and Cells to determine if nsdelegate is used by any of them.

#6006 (closed) investigates a proper fix for the issue, but upstreaming a change to Kubernetes or containerd will take some time.

Edited by James Liu
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information