Improving artifact/cache archive performance
What does this MR do?
Rewrites a chunk of the logic that performs archiving and extraction of artifacts and caches.
Why was this MR needed?
Current issues:
-
filepath.Walk
is slow. -
Multiple
lstat
calls are performed.-
filepath.Walk
already providesos.FileInfo
, but the information is thrown away when adding the file to a list of files to be archived. -
os.Lstat
is called again when the file is added to the list, this time it's temporarily kept in amap[filename]os.FileInfo
. - The
os.FileInfo
data in the map is thrown away, when the list of files is sorted, returning just the slice of filenames. - When adding the file to the archive,
os.Lstat
is called once again, so that the information is passed tozip.FileInfoHeader
.
Whilst the OS will cache the stat information, and return much faster on subsequent calls, there's no need to be throwing the information away and allocating it again.
Using
lstat
to ensure a file exists before archive/extraction is racey, only opening the file will tell us whether it exists or not. -
-
Files are extracted sequentially, rather than using goroutines for decompression to be offloaded to multiple cores.
-
There's no buffer reuse/recycling for files being added to, and extracted from, the archive.
io.Copy
is used each time, which allocates a new buffer for each file.
Solutions:
- Use a faster directory traversal method. Not happy with the current solutions out there, I've written one for this purpose: https://github.com/saracen/walker
-
os.Lstat
once, pass that tozip.FileInfoHeader
. - Concurrent file compression requires harder changes, and is something we should address in the future. But concurrent decompression is easy to achieve.
- Use and then reuse
bufio.Writer
s andbufio.Reader
s for buffering. - The zip archiving and extraction is a generic enough problem that I think it's better to solve it with its own package(s): I've written https://github.com/saracen/fastzip and https://github.com/saracen/zipextra to perform the tasks that were previously done here, but hopefully much better. They expose a simple API to achieve everything we need.
Other issues solved
Whilst we're making these changes, I've also addressed a few other outstanding issues:
-
There's a few operations that happen post extraction:
lchmod
,chtimes
andchown
.- I now use
lchmod
to suppress the annoying log messages users see on this preventable instance #3354. Similarily, I'm usinglchtimes
for setting file modification times. -
lchmod
andlchtimes
should never error. If you have permission to create a file, you have permission to set these. -
chown
is a pain, and I've addressed it the same way as previously done, with a warning message. But I've also fixed a bug where times were not modified on the extracted file if a chown error was encountered.
- I now use
-
Archiving prevented packaging files outside of the build directory, but there was no check in place to prevent extraction to outside of the build directory.
-
When finding a directory to add to the archive outside of the build directory, warning about it and ignoring it... the current implementation then proceeded to still traverse all of the subdirectories, warning that they're still outside of the build directory.
However, if the directory is outside of the build directory, but the build directory is a subdirectory of that, the current behaviour, whilst traversing all children, is that it will then add all files/directories of the build directory once it stumbles across it.
To make this more efficient, and less verbose, I instead warn about the parent directory being outside of the build directory and skip traversing all subdirectories. In the instance where the build directory is a subdirectory of that path, I detect this and instead warn that we'll jump to only including the build directory:
WARNING: path / is outside of build directory, but the build directory is a subpath, walking /srv/stuff/gitlab-runner/build-directory
-
The archive tests were using the current working directory and
chdir
to enter a temporary directory and leave once complete. Likewise, the extraction tests were using the working directory but don't enter and leave a temporary one, they write test files to wherevergo test
was called from. I've now added an argument to specify the root directory for both operations.
I know there's a lot to take in here but I'm having second thoughts about !1173 (closed), a merge request I created some time ago to "fix" this issue by setting the compression level. Whilst that does make things much faster, it's more of a bandaid. I think this MR should take priority, and eventually I'll port customising compression level over to this solution. It'd be a shame for everybody to blindly set the compression level to be really fast, and then not know that a lot of issues are better solved with this solution instead. This also provides improvements for everybody, not just those users that looked up the correct flags to set.
Benchmarks
Writing decent benchmarks is quite a difficult task for this, due to the size and amount of files involved and the difference in the code being tested.
Nevertheless, I've timed running both versions of gitlab-runner
(the current
version and new version with my code) against two large datasets to produce a
rough estimate of gains.
This first table is the results of archiving and extracting with just the "store" method. No compression/decompression takes place. Gains here are from improvements in using a faster walk method, buffer recycling and concurrent extraction.
version | data | operation | method | time |
---|---|---|---|---|
current | linux repo, 70081 files, 3.8G | archiving | store | 12.420s |
new | linux repo, 70081 files, 3.8G | archiving | store | 9.672s |
current | linux repo, 70081 files, 3.8G | extracting | store | 12.973s |
new | linux repo, 70081 files, 3.8G | extracting | store | 9.086s |
current | UE4 Engine, 138085 files, 7.6G | archiving | store | 30.334s |
new | UE4 Engine, 138085 files, 7.6G | archiving | store | 22.939s |
current | UE4 Engine, 138085 files, 7.6G | extracting | store | 33.263s |
new | UE4 Engine, 138085 files, 7.6G | extracting | store | 18.960s |
In the next table, we see that concurrent extraction provides larger gains when the goroutines are busy decompressing:
version | data | operation | method | time |
---|---|---|---|---|
current | linux repo, 70081 files, 3.8G | archiving | standard deflate | 2m43.397s |
new | linux repo, 70081 files, 3.8G | archiving | standard deflate | 2m34.231s |
current | linux repo, 70081 files, 3.8G | extracting | standard deflate | 52.152s |
new | linux repo, 70081 files, 3.8G | extracting | standard deflate | 33.800s |
current | UE4 Engine, 138085 files, 7.6G | archiving | standard deflate | 4m32.904s |
new | UE4 Engine, 138085 files, 7.6G | archiving | standard deflate | 4m19.269s |
current | UE4 Engine, 138085 files, 7.6G | extracting | standard deflate | 1m30.032s |
new | UE4 Engine, 138085 files, 7.6G | extracting | standard deflate | 29.120s |
The next table swaps out the standard libary's compress/flate
with github.com/klauspost/compress/flate
.
klauspost recently rewrote his deflate library and it's now even faster.
version | data | operation | method | time |
---|---|---|---|---|
current | linux repo, 70081 files, 3.8G | archiving | klauspost's deflate | 44.184s |
new | linux repo, 70081 files, 3.8G | archiving | klauspost's deflate | 35.777s |
current | linux repo, 70081 files, 3.8G | extracting | klauspost's deflate | 31.144s |
new | linux repo, 70081 files, 3.8G | extracting | klauspost's deflate | 15.062s |
current | UE4 Engine, 138085 files, 7.6G | archiving | klauspost's deflate | 2m8.177s |
new | UE4 Engine, 138085 files, 7.6G | archiving | klauspost's deflate | 1m54.711s |
current | UE4 Engine, 138085 files, 7.6G | extracting | klauspost's deflate | 1m22.077s |
new | UE4 Engine, 138085 files, 7.6G | extracting | klauspost's deflate | 17.952s |
Summary
In summary, we could just use Klaus' deflate library and see improvements immediately, and perhaps we should do that whilst this MR is being discussed. But I still think this MR is worth it. With concurrent extraction and Klaus' library we've approached the time it takes to extract the non-compressed version and the bottleneck is just I/O, which is exactly what we want to happen. I also think this direction allows for further improvements, and I have some ideas about how we can achieve concurrent compression.
What are the relevant issue numbers?
#1797 (closed) https://gitlab.com/gitlab-org/gitlab-ce/issues/33179 #1765 !1173 (closed)