Skip to content
Snippets Groups Projects

Don't do anything with capabilities in `--sandbox=none`

Open Colin Walters requested to merge cgwalters/virtiofsd:unpriv-no-setcap into main
2 unresolved threads

We landed a big switch to use this project in our build/test tooling: https://github.com/coreos/coreos-assembler/pull/3428

However, we then discovered a corner case where one of our CI flows was running as uid 0 (inside a Kubernetes pod) but without the capbility to mutate its capabilities.

Again, the goal of specifying --sandbox=none for this use case is we don't want virtiofsd to do any additional isolation beyond that imposed externally; so change the code flow here to skip capability changes if --sandbox=none was specified.

Merge request reports

Members who can merge are allowed to add commits.

Merge request pipeline #996471002 passed

Merge request pipeline passed for 4f7fd449

Ready to merge by members who can write to the target branch.

Merge details

  • The source branch is 223 commits behind the target branch.
  • 0 commits and 1 merge commit will be added to .
  • Source branch will not be deleted.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
179 179 })
180 180 }
181 181
182 // Return whether this sandbox
183 pub fn is_none(&self) -> bool {
  • I’m not 100 % sure on this. I don’t find dropping capabilities to be a sort of sandboxing or isolation, similarly to how I find seccomp is orthogonal.

    However, we don’t drop capabilities when uid != 0, indicating that we indeed want to skip that if it can’t work. Maybe we could just make it a warning instead of an error if dropping capabilities fails in --sandbox=none mode? Or add a new switch (similarly to how seccomp is orthogonal to sandboxing and gets its own switch); maybe a special value for --modcaps.

    @germag, what’s your opinion?

    Edited by Hanna Czenczek
    • Author Contributor

      I don’t find dropping capabilities to be a sort of sandboxing or isolation, similarly to how I find seccomp is orthogonal.

      At a technical level one difference with seccomp is that it actually requires no privileges of the calling process at all (assuming the process has entered no-new-privs).

      But changing capabilities requires capabilities! (Also, this does all interact because changing capabilities can be blocked by seccomp too...)

      Maybe we could try to tweak things so that we only drop capabilities...the bug i was hitting may be this:

      $ oc run --rm -ti --image registry.access.redhat.com/ubi9/ubi:latest rhel9-test 
      If you don't see a command prompt, try pressing enter.
      # capsh --print
      Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service=ep
      Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service

      And intersecting that with the default_caps it looks like we're trying to add CAP_MKNOD which isn't in the bounding set. But...again really that's just a more complicated way to no-op in the --sandbox=none mode, and in other flows we'll have created a user namespace so we'll always be able to drop.

    • We do have --modcaps, so you can manually remove capabilities that aren’t available. I think it makes sense that by default we require all capabilities that we will need for unimpeded operation, which includes mknod. At least we should try, and print a warning if they aren’t available, so that users are aware that certain operations may not work even though virtiofsd runs as root.

      (When not running a root, I think it’s OK to not modify capabilities at all, because (at least to me personally) it seems obvious that some operations may require privileges that normal users just don’t have.)

      As I’ve said, I’d be OK with just a warning: We keep trying to acquire all capabilities, but warn that that wasn’t possible in case of an error (instead of erroring out completely). (Ideally printing what capabilities we tried to set, and what the current set is.)

      If you want, after printing the warning, we could still try to at least drop all unneeded capabilities, depending on how much additional complexity that brings. It almost seems better to me if users were to acknowledge the warning message by passing --modcaps to explicitly drop the unavailable capabilities, because if that’s done right, we can drop all capabilities we don’t need, making the warning disappear in the process.

    • For that cap set will be enough to run virtiofsd with --modcaps=-mknod:-setfcap.

      This code:

          if uid == 0 {
              drop_capabilities(fs_cfg.inode_file_handles, opt.modcaps);
          }

      it's just an ugly shortcut for --modcaps=-setfcap:-mknod:-setuid:-setgid:-fsetid:-fowner:-dac_override:-chown, capng::apply() will return an error if the process doesn't have CAP_SETPCAP (in older versions of libcap-ng it will try to log the error using syslog, so it could fail because of a missing seccomp filter)

      But, I think it is a good idea to check for CAP_SETPCAP first, and also check if we can add each cap and show a warning instead of just failing, but just for sandbox=none?. I guess to be consistent, maybe we should check and show a warning regardless of the sandbox mode. Just adding a new option like --modcaps=-all we still need to check for CAP_SETPCAP, so I think checking each cap is a better option.

      Just a note: I think we never defined what should be expected for each sandbox mode (see #43 (closed)), also sandbox=none is not really none, we still setup seccomp filter, that requires --seccomp=none to disable it, and drop caps. So, the seccomp filtering is "disconnected" from the sandbox mode.

    • Please register or sign in to reply
  • Please register or sign in to reply
    Loading