Don't do anything with capabilities in `--sandbox=none`

self-review; maybe pub fn enables_isolation() (inverse, clearer name). Or is_noop() (less conflict with Option::is_none)?

I’m not 100 % sure on this. I don’t find dropping capabilities to be a sort of sandboxing or isolation, similarly to how I find seccomp is orthogonal.

However, we don’t drop capabilities when uid != 0, indicating that we indeed want to skip that if it can’t work. Maybe we could just make it a warning instead of an error if dropping capabilities fails in --sandbox=none mode? Or add a new switch (similarly to how seccomp is orthogonal to sandboxing and gets its own switch); maybe a special value for --modcaps.

@germag, what’s your opinion?

I don’t find dropping capabilities to be a sort of sandboxing or isolation, similarly to how I find seccomp is orthogonal.

At a technical level one difference with seccomp is that it actually requires no privileges of the calling process at all (assuming the process has entered no-new-privs).

But changing capabilities requires capabilities! (Also, this does all interact because changing capabilities can be blocked by seccomp too...)

Maybe we could try to tweak things so that we only drop capabilities...the bug i was hitting may be this:

$ oc run --rm -ti --image registry.access.redhat.com/ubi9/ubi:latest rhel9-test 
If you don't see a command prompt, try pressing enter.
# capsh --print
Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service=ep
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service

And intersecting that with the default_caps it looks like we're trying to add CAP_MKNOD which isn't in the bounding set. But...again really that's just a more complicated way to no-op in the --sandbox=none mode, and in other flows we'll have created a user namespace so we'll always be able to drop.

We do have --modcaps, so you can manually remove capabilities that aren’t available. I think it makes sense that by default we require all capabilities that we will need for unimpeded operation, which includes mknod. At least we should try, and print a warning if they aren’t available, so that users are aware that certain operations may not work even though virtiofsd runs as root.

(When not running a root, I think it’s OK to not modify capabilities at all, because (at least to me personally) it seems obvious that some operations may require privileges that normal users just don’t have.)

As I’ve said, I’d be OK with just a warning: We keep trying to acquire all capabilities, but warn that that wasn’t possible in case of an error (instead of erroring out completely). (Ideally printing what capabilities we tried to set, and what the current set is.)

If you want, after printing the warning, we could still try to at least drop all unneeded capabilities, depending on how much additional complexity that brings. It almost seems better to me if users were to acknowledge the warning message by passing --modcaps to explicitly drop the unavailable capabilities, because if that’s done right, we can drop all capabilities we don’t need, making the warning disappear in the process.

For that cap set will be enough to run virtiofsd with --modcaps=-mknod:-setfcap.

This code:

    if uid == 0 {
        drop_capabilities(fs_cfg.inode_file_handles, opt.modcaps);
    }

it's just an ugly shortcut for --modcaps=-setfcap:-mknod:-setuid:-setgid:-fsetid:-fowner:-dac_override:-chown, capng::apply() will return an error if the process doesn't have CAP_SETPCAP (in older versions of libcap-ng it will try to log the error using syslog, so it could fail because of a missing seccomp filter)

But, I think it is a good idea to check for CAP_SETPCAP first, and also check if we can add each cap and show a warning instead of just failing, but just for sandbox=none?. I guess to be consistent, maybe we should check and show a warning regardless of the sandbox mode. Just adding a new option like --modcaps=-all we still need to check for CAP_SETPCAP, so I think checking each cap is a better option.

Just a note: I think we never defined what should be expected for each sandbox mode (see #43 (closed)), also sandbox=none is not really none, we still setup seccomp filter, that requires --seccomp=none to disable it, and drop caps. So, the seccomp filtering is "disconnected" from the sandbox mode.

mentioned in commit freedesktop-sdk/mirrors/github/ostreedev/ostree@2fe88f80

179	179	})
180	180	}
181	181
	182	// Return whether this sandbox
	183	pub fn is_none(&self) -> bool {
Colin Walters @cgwalters · 1 year ago Author Contributor self-review; maybe `pub fn enables_isolation()` (inverse, clearer name). Or `is_noop()` (less conflict with `Option::is_none`)? Please register or sign in to reply

Don't do anything with capabilities in `--sandbox=none`

Merge request reports

Activity