Lost file metadata in artifacts and images
Problem
Currently, assuming a linux sandbox (we dont have others currently) everything is committed to the artifacts as:
- UID/GID 0
- No extended file attributes
When files are created with setuid/setgid in the sandbox, those should be recorded in the artifacts, but; when we checkout from ostree in user mode, the setuid/setgid bits are stripped.
This means that in addition to the above, when we create a bootable image, we pack in files that the sandbox sees without setuid/setgid bits.
Solution
Since recently, we now have a fuse subsystem in BuildStream which the sandbox uses, currently we only have one fuse layer which provides a copy-on-write hardlink experience (to solve issue #19 (closed)).
To solve this I would like to take an approach similar to yocto's pseudo tool, but instead of doing any LD_PRELOAD, we would implement it with a fuse layer.
This will mean essentially the following:
- The local artifact cache will have to be able to tell us about the real UID/GID, file attributes and extended attributes for any file, for the ostree artifact cache this can be done by following the ostree source code of
ostree ls
- The sandbox will use a fuse layer for the sake of spoofing the sandbox environment with the real attributes; this will be a separate fuse layer as the existing one we have for copy-on-write hardlinks.
- The fuse layer will need to store the real attributes introspected from the artifact cache in a temporary local store, an in memory sqlite database could be a good choice if this is too intense for simple python data structures.
- The fuse layer will handle filesystem callbacks in such a way that:
- Calls to chown and redirected to the local store and not applied to the underlying filesystem
- Setting extended attributes always succeeds, but is stored in the temporary store and not applied to the underlying filesystem
- Reading the attributes and ownership is always read from the store and not from the underlying filesystem
- When the fuse layer is unmounted, we need to persist the attributes (real UID/GID and xattrs etc) recorded by the fuse layer, or obtain that data somehow
- When committing the artifact, the artifact cache API needs to have some interface for accepting the ownership and attributes separately from the files being committed, for the ostree artifact cache implementation this can be applied with the commit modifier callbacks, for other stores such as tarballs its just as easy.
- The sandbox will have to orchestrate the fuse layer; it will be required on the root filesystem at all times, it is not required in /buildstream/build (it will slow things down significantly if we use it there, too), and it is also required in the read-write /buildstream/install directory. The sandbox
mark_directory()
API should be enhanced to communicate some of the requirements that an element has on the marked directories so that the sandbox can make a sane choice about this.