Skip to content

Try remote caching

Mikhail Mazurskiy requested to merge ash2k/bazel-remote-cache into master

Closes #11 (closed).

The latest test job for master took around 22 minutes. Out of which 8+ minutes were spent downloading and unpacking the GitLab CI cache and the same 8 minutes packing and uploading it. So 16 minutes to have the cache enabled. That means the build itself takes around 6 minutes using the downloaded cache.

Build without the CI cache takes 12 minutes, so... it's better without the cache?! Well, if we had the cache on disk and unpacked instantaneously, that would have been a 2x improvement (6 vs 12 minutes). But downloading and unpacking takes a lot of time. This is likely because our build has a lot of intermediate files (because we have a lot of dependencies) and because of that the cache is big both in terms of MB and in terms of number of files (37,000+), generated by the build.

This MR enables Bazel's remote caching.

First run

Duration: 33 minutes 48 seconds

https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/jobs/701790804

This is slow because the build populates the remote cache by uploading all the build artifacts, including the intermediate ones.

Second run - no code changes

Duration: 5 minutes 38 seconds

https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/jobs/701816523

This is much faster! Faster than the original 22 minute build.

Third run - no code changes, --remote_download_minimal flag

Duration: 4 minutes 54 seconds

https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/jobs/701822277

This is even better. But if everything is cached, what's happening for 5 minutes?! It's downloading all the libraries we depend on. More on this in the FAQ. It's unfortunate, but we cannot do much about it at the moment.

p.s. To learn more about remote caching and remote execution in bazel watch this talk https://www.youtube.com/watch?v=MyuJRUwT5LI There are many other talks and blogs on this topic if you are curious.

p.p.s. Let's try a local build:

bazel test //...
INFO: Invocation ID: 4ac3b676-4686-40da-879d-0a529db036b4
INFO: Build option --test_env has changed, discarding analysis cache.
INFO: Analyzed 33 targets (654 packages loaded, 11719 targets configured).
INFO: Found 27 targets and 6 test targets...
INFO: Elapsed time: 3225.158s, Critical Path: 3163.48s
INFO: 623 processes: 623 darwin-sandbox.
INFO: Build completed successfully, 624 total actions
//internal/agentk:go_default_test                                        PASSED in 6.9s
//internal/gitlab:go_default_test                                        PASSED in 0.5s
//internal/kas:go_default_test                                           PASSED in 0.5s
//internal/tools/testing/kube_testing:go_default_test                    PASSED in 0.8s
//internal/tools/wstunnel:go_default_test                                PASSED in 5.0s
//pkg/agentcfg:go_default_test                                           PASSED in 0.4s

INFO: Build completed successfully, 624 total actions

3225.158s is 53 minutes. My internet connection from Sydney to the GCP bucket in US is struggling quite a bit. Also, it's just 100/40mbps down/up.

Let's try another time, now with fully populated remote and local cache:

bazel test //...
INFO: Invocation ID: 1d7ae98f-0c87-44da-bd06-033fd5ab1328
INFO: Build option --test_env has changed, discarding analysis cache.
INFO: Analyzed 33 targets (0 packages loaded, 11719 targets configured).
INFO: Found 27 targets and 6 test targets...
INFO: Elapsed time: 0.909s, Critical Path: 0.24s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
//internal/agentk:go_default_test                               (cached) PASSED in 6.9s
//internal/gitlab:go_default_test                               (cached) PASSED in 0.5s
//internal/kas:go_default_test                                  (cached) PASSED in 0.5s
//internal/tools/testing/kube_testing:go_default_test           (cached) PASSED in 0.8s
//internal/tools/wstunnel:go_default_test                       (cached) PASSED in 5.0s
//pkg/agentcfg:go_default_test                                  (cached) PASSED in 0.4s

INFO: Build completed successfully, 1 total action

1 second.

Let's try another time, with clean local cache:

bazel test //...
INFO: Invocation ID: 84940def-0584-495d-a9bd-94a0831fc5cf
INFO: Build option --test_env has changed, discarding analysis cache.
INFO: Analyzed 33 targets (654 packages loaded, 11719 targets configured).
INFO: Found 27 targets and 6 test targets...
INFO: Elapsed time: 118.916s, Critical Path: 69.62s
INFO: 1298 processes: 1290 remote cache hit, 8 darwin-sandbox.
INFO: Build completed successfully, 1306 total actions
//internal/agentk:go_default_test                               (cached) PASSED in 1.2s
//internal/gitlab:go_default_test                               (cached) PASSED in 0.9s
//internal/kas:go_default_test                                  (cached) PASSED in 1.0s
//internal/tools/testing/kube_testing:go_default_test           (cached) PASSED in 0.8s
//internal/tools/wstunnel:go_default_test                       (cached) PASSED in 1.0s
//pkg/agentcfg:go_default_test                                  (cached) PASSED in 1.3s

INFO: Build completed successfully, 1306 total actions

2 minutes, which is very good, taking into consideration my internet connection.

Edited by Mikhail Mazurskiy

Merge request reports