The CACHE_FALLBACK_KEY no longer useful as caches by default have a suffix attached to their name at time of creation.
Overview
Our cache isn't shared anymore between our default branch (protected) and our feature branches (unprotected) because of a new dynamic suffix added after cache-<index> that depend on ref protection.
A -non_protected suffix is added on the feature branchjob:cache:key trying to get the default branchjob:cache:key that was created with the -protected suffix. See logs
Expected behavior
A way to delete this suffix.
A new variable could fix it on FALLBACK_KEY but i can't see how to fix it using lockfile as key.
Note that the cache-<index> already causes problems on FALLBACK_KEY and ours depends on $CI_MERGE_REQUEST_TARGET_BRANCH_NAME
Relevant logs and/or screenshots
Default branch job log
Checking cache for jest:mobile:master-3-protected...
Checking cache for project-ea268a7e34ce64f1a07ba41343c9c84cd6a48bb0-3-protected...
Feature branch job log
Checking cache for jest:mobile:1703-link-faq-details-3-non_protected...WARNING: file does not exist Failed to extract cacheChecking cache for jest:mobile:master...
Checking cache for project-ea268a7e34ce64f1a07ba41343c9c84cd6a48bb0-3-non_protected...
Used GitLab Runner version
Running with gitlab-runner 14.10.0~beta.50.g1f2fe53e (1f2fe53e) on blue-4.shared.runners-manager.gitlab.com/default J2nyww-s
Workarounds
Using lockfile key: generate cache on a non protected branch for each lockfile updates...
Using fallback key: adding -protected or non_protected manually as cache-<index>...
Approach #2: Create a scheduled job on a non-protected branch that follows the default branch, and generates a shared cache that is reused by feature branches.
Approach #3: For self-managed users using remote caches: use a CI job to sync the bucket from the default branch to the current branch.
Thanks, this doc is really clear. It's a regression for us (looking feature) because non_protected only pull cache from protected without push but i saw your technical / security reasons, we'll try to do otherwise
@pedropombeiro I see that the doc mention it's introduced with 15.0, but the change have been back-ported in 14.8.6, 14.9.4 and 14.10.1. Also the link to the issue is incorrect.
Unfortunately, we're not ready to disclose additional information for the time being to allow users enough time to safely upgrade, so that merge request needs to be kept private.
Same problem has just bitten us. We use the master branch unit test running times to parallelise the unit tests in the feature branches and now we can't do that and all our feature pipelines are failing.
Same here - our cache is getting stored in minio with a "-non-protected" suffix to the name. A subsequent job to delete these is now failing as the name is not what i expects.
That is correct, this was part of a security fix. The idea was that caches would be regenerated under the new names, making it a transparent operation for users - other than a one-time performance hit while regenerating the cache. I see now that this was likely an optimistic view of the problem, considering the myriad ways that users may be leveraging caches.
That being said, it is important for us to understand what your different scenarios are, that have prevented things from going as planned, so I would really appreciate it if users coming here could explain why regenerating the cache under a new name is breaking their workflow.
Lockfile cache (based on yarn.lock) started to fail on MR Pipelines from non protected source branches. Why? The job generating this cache is called only on specific changes (maybe a bad practice?) and other jobs are only pulling this cache.
Generating a new lockfile cache for non_protected branches (just once) FIXED the problem, we won't need to do this manipulation again, future lockfile keys will be correctly generated.
The only difference for me is that there are lockfile caches for protected branches and lockfile caches for non protected branches. It is not a real problem but this "duplicate" will of course add some CI minutes in our process.
2. MR Pipelines - QA / Tests cache
In order to use QA/Tests Widgets/Reports & Badges with correct values, we have to lint / test our entire project (not only changes). We're using cache to save CI minutes.
But on MR Pipelines we use $CI_JOB_NAME:$CI_MERGE_REQUEST_TARGET_BRANCH_NAME as FALLBACK_KEY in order to pull job:cache from the target branch on the 1st pipeline saving CI minutes and then generating job:cache for the source branch using $CI_JOB_NAME:$CI_COMMIT_REF_NAME as job:cache:key
This is adding some CI minutes again because FALLBACK_KEY is not working anymore (note that we were already impacted by the cache<index>) but this won't affect our pipelines status, it's ok.
Maybe we're doing wrong ? But the main problem for me is FALLBACK_KEY, what's its purpose? It doesn't include any cache key suffix but i can manually specify <key>-3-protected ? Moreover, isn't it just a fallback that only check/pull accessible cache without any security issue ?
Hello ! This affects my teams with no satisfying workaround.
Use case : Gitlab Pages does not provide a per-branch HTML serving. We are using cache to gather previously generated HTML from other branches, updating current branch content, and then serving all branches while updating cache for next run.
Now master cache is not accessible anymore. We will probably unprotect our master branch for now, which is (very) far from ideal. I don't see how CACHE_FALLBACK_KEY can help me, since unprotected cache is available, hence fallback one is not accessed.
As it it would pull from the protected cache if the non_protected cache does not exist, but then follow the policy for writing to non_protected? Also there is no way to enable this currently?
Would you add another solution, like the protected variables ? by default protected, but we can opt out by ticking out a checkbox in the protected section of the UI.
Some people would like to protect branches from direct/forced push, but not from anything else
As it it would pull from the protected cache if the non_protected cache does not exist, but then follow the policy for writing to non_protected? Also there is no way to enable this currently?
@jimmy-outschool Correct. This is something that would need to be developed on the Runner side still. This logic doesn't exist yet.
My use case is an aggregation of HTML produced per branch and served with Gitlab Pages, using pull-push caching.
Example scenario :
pipeline on main branch : producing and caching HTML. (shared) cache contains :
public/main/*.html
pipeline on new feat-branch-1, (shared) cache contains :
public/main/*.html
public/feat-branch-1/*.html
pipeline on new feat-branch-2, (shared) cache contains :
public/main/*.html
public/feat-branch-1/*.html
public/feat-branch-2/*.html
feat-branch-1 is merged, pipeline on main is triggered, when finished, (shared) cache contains :
public/main/*.html [updated]
public/feat-branch-2/*.html
(public/feat-branch-1/*.html is deleted by some code checking which branches still exist)
With CACHE_FALLBACK_KEY, I could pull main cache from feature branches, but only when those branches have no cache. It would be OK if I don't then push the cache, but I can't find a way to make my scenario work with this.
I have a better workaround though : an intermediate, unprotected, develop branch. Synchronized with main content. Since the only protection I want is from accidental deletion.
@manvydas.urniezius that could allow leaking of secrets from the protected cache. Are you suggesting giving the project maintainer the responsibility for taking that decision, with the caveat that it could expose secrets if they were ever (even inadvertently) added to the protected cache?
I wonder what is the use case for adding secrets to cache.
One of the most common scenarios, which I'm aware of, is to push dependencies (i.e. node_modules) to cache. In such case there should be no danger to share these dependencies between protected/non_protected caches. Having this common scenario in mind, it would be beneficial to have some sort of setting per cache for such behaviour.
Are you suggesting giving the project maintainer the responsibility for taking that decision, with the caveat that it could expose secrets if they were ever (even inadvertently) added to the protected cache?
Or at least providing a mechanism for a maintainer to be able to assert that particular caches won't (or will) contain secrets.
We're also hit by this change, and I don't see any of the workarounds working for us. We want to keep a cache of the node_modules directory whenever package-lock.json changes. All changes to package-lock.json happen in an unprotected branch. On merge to master, which is protected, the Build job fails because it cannot find the cache anymore. It seems I cannot set a custom CACHE_FALLBACK_KEY because the cache key generated by key:files is not available to me.
We used several caches that were written by a protected branch and read by non-protected.
Use case 1: php composer caches to avoid downloads. Most branches will not change dependencies so writing caches in one stable branch will prevent duplicate downloads in all feature branches.
Use case 2: a file with a list of known problems written by a protected branch that all feature branches read.
Both would work with approach 1, although I prefer a way to disable these suffixes per cache. I already know that the non-protected caches can not exist, requesting them is pointless.
Why is control taken away from me? Default is fine but please give me a way to disable suffixes for use cases where they don't make sense.
Why is control taken away from me? Default is fine but please give me a way to disable suffixes for use cases where they don't make sense.
Completely agree. This behavior has existing for a long time, the b/w view for security is a bit too iron fisted especially for private repositories. Given the breaking nature, the same priority applied when making this change should be applied to restoring it when opted in.
Falling back to the protected cache if the non_protected cache is not present would solve this for us.
Not adding uncontrollable suffixes to the cache keys we specify ourselves would also solve this for us.
This regression hits us as well, and it is unfortunate that we cannot solve this easily.
We want jobs on a new and unprotected branch (e.g. a new MR) to use the caches created by the protected main branch if no own caches exist yet. We had implemented this by adding two caches to each job and three lines of code to use the default cache if needed:
one needs to manually update the counter on cache clear (#360438 (moved))
The only viable solution I can see now is to create an extra unprotected branch that re-creates all caches with the correct suffix, which is not ideal. Is there any other way I can get this working for non-distributed caches?
Definitely weird that we have to put a workaround in place for a unexpected and unannounced breaking change CACHE_FALLBACK_KEY: "${CI_DEFAULT_BRANCH}-protected"
and then revert it when things will eventually be fixed.
Any updates to this we have the same problem. We use the cache files feature for our install pipelines, CACHE_FALLBACK_KEY is no option for us, but I think Approach #1 would work for us
A possible solution for self managed instances could be a flag for disabling this feature so everybody can decide for themself.
I don't have an estimate for how quickly this will make it, but I can tell you for sure that it will not make it in the 15.0 release as the code freeze will happen in a few days.
For everyone having trouble with the cache suffix, my MR implementing a setting for this has been deployed to gitlab.com and you can now disable separated caches
I don't know if it will make it into the 15.0 self-managed release, but I think the chances are still good.
Update: The Setting is confirmed to be included in the 15.0 Release Candidate and also has been backported to 14.10 via the 14.10.3 patch release which released today.
This is great news! Thanks for making it possible so fast. I've got confirmation from my customer that they see the option in settings now. Now they have a follow up question:
Is there a way to revert the default value for any project created under the group, or is this setting only available on each repository?
I don't find anything in the api docs is this possible to set via api? I don't want to set that manually for every project in our instance (1000+ Projects). BTW a global setting for this would be nice
There's an extra cache version suffix that comes from invalidating caches. That's what the 14 is.
It's a separate thing, and as long as you stay within the same pipeline and don't use the cache fallback key, you'll be fine with the steps outlined here.
I'm assuming you're being hit because you're referencing the full name of the cache somewhere else (probably due to the parent-child pipeline thing).
If we replace CACHE_FALLBACK_KEY: non-existent-key to something non existent then start receving the following CI output.
Restoring cache Checking cache for random-key-should-never-exist-non_protected... WARNING: file does not exist Failed to extract cache Checking cache for non-existent-key... WARNING: file does not exist Failed to extract cache
@DarrenEastman I think this fits the issue, but I'm happy to create a separate one. I've ran into this while working with a customer in #414305 (internal).
The CACHE_FALLBACK_KEY seems to be almost useless now that caches by default have a suffix attached to their name at time of creation. Here's a simple example, assuming default branch protection settings (main protected, nothing else):
Run this on the main branch, and a cache named test-main-protected will be created.
Next, run this on a branch called somebranch. The CI job will first check for a cache named test-somebranch-non_protected – this does not exist. It will then attempt the CACHE_FALLBACK_KEY and look for a cache named test-main. This also does not exist, because the earlier run created test-main-protected.
The only workaround would be to hardcode CACHE_FALLBACK_KEY and add the -protected suffix yourself. That somewhat defeats the purpose of it being a fallback, though.
I think one could avoid this by Use the same cache for all branches, but a) this is usually not desired and b) the docs for the global fallback key do not mention this limitation at all. To my understanding, the example currently given in the docs has absolutly no effect when applied to a new project with default settings. The fallback-key defined there will never produce a hit, because it is impossible to create a cache with that name.
It becomes more complex when taking into account that the [non_]protected part is not the only possible suffix. When you Clear the cache manually, an index is increased and from then on appended to every cache name upon creation.
So now you'd have to hardcode CACHE_FALLBACK_KEY to e.g. $CI_JOB_NAME-$CI_DEFAULT_BRANCH-7-protected, and update this whenever the cache is manually reset.
As far as I know, the [non_]protected part was added for security reasons, so I'm wondering if the CACHE_FALLBACK_KEY behavior was forgotten when the suffixes were added.
In this comment (cc @ratchade) it almost sounds like the new fallback_keys keyword is supposed to be a replacement for the CACHE_FALLBACK_KEY, but they don't serve the exact same purpose.
@pedropombeiro Based on the user impact it appears that we need to look at an option to address the problems introduced by the security fix. Does finding a solution require time set aside for a spike in an upcoming iteration?
@DarrenEastman I believe we should add this to a milestone refinement issue so that we can take a holistic look at the current state and alternatives, but there are a couple of thoughts that come to mind:
the previous state already had an issue with the incrementing cache index. So we could at least try to get back to the same situation by adding the [non_]protected suffix to CACHE_FALLBACK_KEY on the Runner side based on the ref status.
we need to validate that #361235 (closed) does fix the issue with the incrementing cache index (as I'd expect it does).
@pedropombeiro I just tested this out in our self hosted Gitlab (16.4) and it only partly works. Here's a snippet from the job log (sorry, can't share link):
Checking cache for test-new-cache-key-fallback-3-non_protected...WARNING: file does not exist Failed to extract cacheChecking cache for main-3-non_protected...WARNING: file does not exist Failed to extract cache
As you can see, if the pipeline is running for an unprotected branch then it tries to restore unprotected cache even if the fallback key would point to a protected branch like so:
So while fallback_keys is working better than the env variable currently is, it still doesn't solve the problem completely without turning off cache separation between protected and unprotected refs.
EDIT:
Now that I actually read what the setting says that's expected behaviour. So you can just disregard what I just said
Wouldn't it be an option to allow reading from protected caches, but not writing to it?
As I see it, the separation into protected and non-protected caches is to avoid cache poisoning (malicious write from a non-protected branch into the protected cache). But I don't see how reading from a protected cache could cause any security implications.