Implement a VFS
Some notes from @eternaleye
Desiderata
-
Capability-secure
Reason: Necessary in order to be a reliable building block for other capability-based systems
Effect: Cannot have '..' behavior, actions on a cap (~fd) must be predictable from arguments (for membranes). Symlinks need rethinking.
-
Maximal freedom in filenames
Reason: Limiting filenames to Unicode introduces semantic questions of equality comparison, slashes being special chars causes problems in real-world Unix usage, permitting NUL bytes allows usage as a general nested KV store API
Effect: Names are length-delimited byte arrays (&[u8]), no blessed directory separator (paths are arrays of filenames)
-
Allow the API of individual nodes to be extended
Reason: Stacked filesystems (such as zip translators or symlink resolution) must be possible to implement without occluding the underlying system. Doing so might be the right choice, but not always - a true Robigalia codebase may want to access a Posix filesystem, while a Posix program may want to apply a layer that maps extended APIs to base ones (like open() on a symlink resolving it rather than viewing its textual content).
Effect: Trait-like design pattern - each node implements some set of Behaviors, each of which has some number of Interactions - which must not collide
-
Design the VFS such that it can be used efficiently in-process
Reason: It is valuable to allow the VFS to be used entirely in-process in some cases, in which case the API may not use seL4 caps at all. For example, having symlink resolution occur in an in-process layer applied by the Posix-env libc may be compelling.
Effect: The VFS should not rely on features only available when using seL4 capabilities.
-
Include the primitives needed for simple atomic behaviors
Reason: Reading a directory is often racy, as a new entry may have been added mid-listing. Other such cases abound.
Effect: Each object in the FS must have a version; all mutations must alter it.
-
Avoid transfer of process state into the VFS
Reason: read() and write(), with an implicit offset, are both more complex to implement and have serious problems regarding concurrency - while pread() and pwrite(), the explicit equivalents, are considerably cleaner. This kind of distinction is common, and as stateful behaviors can be implemented on top of stateless ones with little overhead, the latter should be strongly preferred
Effect: Each interaction should carry its state explicitly