Incremental cache files
Description
I've noticed that your cache step basically copies a zip file and unzips it, then creates a new zip file and zips everything the cache key needs. This take a long time when there are multiple small files.
An example for this is the .cocoapods folder on a big Unity project, it can have up to 800K small files. This also affects native iOS projects since it can easily grow a lot when you add multiple dependencies to your code. Some other examples might be an .npm folder, vendor for ruby gems, php composer, and so on.
Proposal
Make incremental cache files. Either using an --rsyncable file (https://beeznest.wordpress.com/2005/02/03/rsyncable-gzip/) or by simple adding folders/files to the original cache.zip only when they are newer than the ones that are already there, and removing the ones that are missing on the current workdir (rsync can do this automatically but maybe it's not as multi-platform as needed).
It should also help a lot by exposing the compression rate thus allowing us to change it. I'd preffer to have a bigger cache file, than a slower pipeline, but some people might not agree with me on that.
Links to related issues and merge requests / references
I can't add links since it's a private project, but here are some relevant log lines:
Thu Oct 24 17:59:35 UTC 2019: == [exec] step started ==
Thu Oct 24 17:59:35 UTC 2019: == [archive_cache] sub-step started ==
Thu Oct 24 17:59:35 UTC 2019: Getting ip address...
Thu Oct 24 17:59:37 UTC 2019: Environment information requested. Got response: {"id":"**REMOVED**","ip":"**REMOVED**"}
Thu Oct 24 17:59:37 UTC 2019: Connecting through ssh@**REMOVED** to execute command...
Thu Oct 24 17:59:37 UTC 2019: Waiting for ssh to finish...
Creating cache build_ios...
Runtime platform arch=amd64 os=darwin pid=2995 revision=22516659 version=
Library/: found 16218 matching files
vendor/: found 8516 matching files
.cocoapods/: found 806788 matching files
Builds/App-Package/Pods/: found 1450 matching files
No URL provided, cache will be not uploaded to shared cache server. Cache will be stored only locally.
Created cache
Thu Oct 24 18:13:01 UTC 2019: == [archive_cache] sub-step ended ==
Thu Oct 24 18:13:01 UTC 2019: == [exec] step ended ==
I've **REMOVED** some information to avoid disclosure of IPs and pipeline/job ids, I wouldn't mind giving more information to a Gitlab developers if you need it.
As you can see, the zip process itself, takes about 15 minutes, which makes caching dumb, since all the time it shaves off the build process, it adds back when it re-creates the cache file. If you have multiple steps that need this cache, it multiplies.
This build ran on a vSphere VM with macOS, 12 core CPU and 16G of RAM. I think this is pretty acceptable, but perhaps having more RAM could help. As far as I've seen zip doesn't take much advantage of multi-cores so adding more CPUs shouldn't make a difference.