[bug] [repro-case] the gitlab.com CI cache incorrectly changes file permissions
## Summary The gitlab.com CI cache incorrectly changes file permissions between run. For example, some files are put in the cache with permission `rw-rw-rw-` (or `r--r--r--`), and after being restored for the next job their permission changed to `rwxrwxrwx`. This breaks the CI when using tools that check file permissions and may fail or behave in unexpected ways if a file unexpectedly becomes executable. (The [dune](https://github.com/ocaml/dune/) build system is an example of such problem for which my workflow was broken, but it seems that many other Gitlab users have encountered the issue with similar programs.) More precisely, the bug shows up on files that are the target of a symbolic link also included in the cache. (This sounds like the same underlying issue as #3354, but I have a precise bug with a simple reproduction case so I preferred to create a new issue.) ## Steps to reproduce I created a minimal reproduction case for this bug as a Gitlab.com repository: <https://gitlab.com/gasche/gitlab-ci-cache-permission-bug> The git repository contains a small text file `a.ml`, and a symbolic link `link.ml -> a.ml`. The [.gitlab-ci.yml script](https://gitlab.com/gasche/gitlab-ci-cache-permission-bug/-/blob/main/.gitlab-ci.yml) has two job, one that creates a cache (`policy: push`) and copies `a.ml` and `link.ml` in the cache, another that reads from this cache cache (`policy: pull`). At the end of the first job, and at the beginning of the second job, `stat` is called on the cached files; the permissions for `cache/a.ml` have changed from `rw-r--r--` to `rwxrwxrwx`, this is the bug. .gitlab-ci.yml file: ``` stages: - create_cache - use_cache - keep_cache test_create_cache: image: alpine:latest stage: create_cache cache: paths: - cache policy: push script: - "# This job populates the cache for the next job" - stat a.ml link.ml - mkdir -p cache - cp -a a.ml link.ml cache/ - "# show the metadata (including permissions) of cache/" - stat cache/ cache/a.ml cache/link.ml test_use_cache: image: alpine:latest stage: use_cache cache: paths: - cache policy: pull script: - "# This job restores the cache from the previous job." - "# show the metadata of cache/, should be the same as before." - stat cache/ cache/a.ml cache/link.ml ``` ## Actual behavior Look at recent [pipeline results](https://gitlab.com/gasche/gitlab-ci-cache-permission-bug/-/pipelines) in the repository -- currently the last pipeline was <https://gitlab.com/gasche/gitlab-ci-cache-permission-bug/-/pipelines/246796107>. In the [log of the create_cache job](https://gitlab.com/gasche/gitlab-ci-cache-permission-bug/-/jobs/986220648) we see the following output: ``` File: cache/a.ml Size: 10 Blocks: 16 IO Block: 4096 regular file Device: 809h/2057d Inode: 1571568 Links: 1 Access: (0666/-rw-rw-rw-) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2021-01-26 05:49:53.000000000 +0000 Modify: 2021-01-26 05:49:53.000000000 +0000 Change: 2021-01-26 05:49:54.000000000 +0000 File: 'cache/link.ml' -> 'a.ml' Size: 4 Blocks: 8 IO Block: 4096 symbolic link Device: 809h/2057d Inode: 1571579 Links: 1 Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2021-01-26 05:49:54.000000000 +0000 Modify: 2021-01-26 05:49:54.000000000 +0000 Change: 2021-01-26 05:49:54.000000000 +0000 ``` In the [log of the use_cache job](https://gitlab.com/gasche/gitlab-ci-cache-permission-bug/-/jobs/986220650) we see the following output: ``` File: cache/a.ml Size: 10 Blocks: 16 IO Block: 4096 regular file Device: 809h/2057d Inode: 1571543 Links: 1 Access: (0777/-rwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2021-01-26 05:50:09.000000000 +0000 Modify: 2021-01-26 05:49:53.000000000 +0000 Change: 2021-01-26 05:50:09.000000000 +0000 File: 'cache/link.ml' -> 'a.ml' Size: 4 Blocks: 8 IO Block: 4096 symbolic link Device: 809h/2057d Inode: 1571544 Links: 1 Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2021-01-26 05:50:09.000000000 +0000 Modify: 2021-01-26 05:50:09.000000000 +0000 Change: 2021-01-26 05:50:09.000000000 +0000 ``` The permission change for `cache/a.ml` from `rw-rw-rw-` to `rwxrwxrwx` was caused by Gitlab's logic to archive and then restore the cache, this is the bug. ## Expected behavior The permission of files put in the cache should be preserved. ## Other reports that are probably the same issue Many permission-related bugs have been reported against Gitlab and gitlab-runner, but some of them seem unrelated to the present one. (It is possible that the present bug was introduced by a faulty fix for a past issue). Two bug reports seem closely related: - https://gitlab.com/gitlab-org/gitlab-runner/-/issues/3354 , "chmod is used instead of lchmod", specifically discusses a permission change for files that are the target of a link, and points at a problematic piece of code, <https://gitlab.com/gitlab-org/gitlab-runner/blob/master/helpers/archives/zip_extract.go#L98>, which seems responsible for the bug. This bug was reported 2 years ago with reports of several users affected (trying to cache files produced by `npm`, or `yarn`, or `bazel`), some of them declaring the bug as critical for their workflow. It looks like progress on this issue has stalled by lack of a repro-case, so I have that the present one will have a happy ending. - https://gitlab.com/gitlab-org/gitlab/-/issues/18711#note_257196029 is a comment (a year ago) in an issue about choosing the cache/artifact compression method. The comment reports on a permission bug caused by the move of Gitlab from tar.gz to zip. The whole issue touches unrelated topics, but this comment shows that the user @ejgallego encountered the same bug as I did (we are using the same `dune` build system and it is @ejgallego who pointed me in the right direction as I was trying to debug the issue; thanks!). ## Possible fixes In discussion of #3354, @danielmd3000 highlights the following piece of code as the likely cause of this issue: https://gitlab.com/gitlab-org/gitlab-runner/blob/master/helpers/archives/zip_extract.go#L101 ``` for _, file := range archive.File { // Update file permissions if err := os.Chmod(file.Name, file.Mode().Perm()); tracker.actionable(err) { logrus.Warningf("%s: %s (suppressing repeats)", file.Name, err) } // Process zip metadata if err := processZipExtra(&file.FileHeader); tracker.actionable(err) { logrus.Warningf("%s: %s (suppressing repeats)", file.Name, err) } } ``` It sounds plausible that the bug is caused by the `Chmod` call following the symlink, and changing the permissions of the link target to the permissions of the link itself. The reporter suggests to use a `lchmod` call instead (according to [this freebsd manpage](https://www.freebsd.org/cgi/man.cgi?query=lchmod&sektion=2), lchmod is similar to chmod bug does not follow symlinks). In fact if I run `git grep lchmod` in the current gitlab-runner directory, I see similar-looking code in the vendored code of the `fastzip` dependency (no idea whether this is what the cache-extractor uses): https://gitlab.com/gitlab-org/gitlab-runner/-/blob/master/vendor/github.com/saracen/fastzip/extractor.go#L294-296 (This dependency includes an implementation of `lchmod` in go, no idea whether it is correct.) It looks like this issue is caused by the fact that Go's `archive/zip` provides an API to read a zipfile content, but does *not* provide an easy way to correctly extract an archive in a directory. Users have to traverse the archive themselves and creates files themselves, and most of them (including apparently Gitlab) get it wrong -- when calling the venerable `unzip` command instead would work perfectly well. (I suppose that some Go progammers have implemented a correct unzipper for zip archives, and it might be wise to reuse that instead of maintaining your own code. But in the meantime, fixing the gitlab-runner implementation would be very helpful to affected users.) Finally: browsing the web points to snippets for zip-unarchiving, for example<https://golangcode.com/unzip-files-in-go/>, several of which containing a specific protection against a so-called [ZipSlip](https://snyk.io/research/zip-slip-vulnerability) vulnerability, and it appears that Gitlab's implementation does not include such a protection.
issue