Waste less time compiling Octez
Motivation
Compiling Octez takes a lot of time.
- Compiling from scratch:
make
takes 10 minutes on my brand-new laptop. This time doesn't includemake build-deps
though. - I just
git pull
edmaster
and ranmake
; it took 10 minutes too. - Then I created an empty commit and ran
make
; it took 30 seconds. - Then I did absolutely nothing and ran
make
; it took 12 seconds to realize that there was nothing to do.
Each developer encounters:
- the first case every time we update some external libraries, i.e. about every 1-3 weeks;
- the second case at least once per day;
- the third and fourth case dozens of times per day.
Summing up all of these scenarios, a rough estimation is that engineers can easily spend 30 minutes, everyday, waiting for Octez to compile, not being able to do much as 100% of their CPU is being used. Each time, they may forget what they were doing and lose even more time because of the context switch. They may decide to skip some compilations and to just hope that it works, causing them to notice issues later on code that they already moved on from, and thus to once again lose time because of the additional context switch. So those 30 minutes can easily become 1 hour, which is a significant part of an 8-hour working day.
Not only is that significantly impairing productivity, it also lowers morale.
Scope
The goal of this milestone is twofold:
- introduce a "slim" mode where one would compile Octez with only a small selection of protocols, such as genesis, Alpha and the current protocol of Mainnet;
- investigate to find other ideas.
(I name it "slim" mode instead of "light" because the Octez client already has an unrelated "light" mode.)
Not compiling protocols is already something that is done in the CI. It was introduced years ago at a time where the number of protocols were significantly lower and already was a significant improvement on the CI wall time at the time. Having an environment variable control the manifest to disable some protocols should be easy; the difficult part is doing that in such a way that git commit -am
would not commit the slim mode configuration by mistake. Ideally, one could also easily configure the set of protocols to compile.
The second part is more exploratory. At the end of the milestone, we will evaluate if it is worth pursuing our investigations in a second milestone. Some ideas are:
- rework how commit hashes are stored in executables (the overhead of changing the commit hash being about 20s for an empty commit);
- run
dune
with some verbose logs to investigate which files take time to compile; - in particular, protocol environments took significantly longer than other files to compile in the past, so it may be worth to understand why.
Experiments
Empty Build
Run make
, then time make
and note the time it took to run the second make
.
- on
master
it takes about 12s (about 54k targets to build) - if one removes old protocols it takes about 8s (about 32k targets to build)
Assumption: this time is roughly proportional to the number of files that dune
has to compile, which is roughly proportional to the number of files that it takes to run make
from scratch. So it looks like not compiling old protocol has the potential to reduce build time by 1/3.
Full Build
Run rm -rf _build
then time make
.
- on
master
: 10 minutes - without old protocols: 7 minutes
What Dune Reads
Running strace
on dune build src/bin_node
tells us that it calls newfstatat
more than once on those files (the number is the number of times):
2 ".",
2 "dune-project",
2 "dune-workspace",
3 "_build",
4 "ci
8 "data-encoding
8 "images
8 "michelson_test_scripts
14 "client-libs
20 "contrib
24 "docs
39 "devtools
88 "scripts
127 "etherlink
298 "brassaia
298 "irmin
383 "tezt
3329 "",
4460 "
7189 "src
31947 "_build
Here are similar results for openat
:
4 "ci
4 "images
8 "data-encoding
8 "michelson_test_scripts
14 "client-libs
20 "contrib
22 "docs
39 "devtools
57 "brassaia
57 "irmin
87 "scripts
127 "etherlink
375 "tezt
542 "
1330 "src
9533 "_build
It doesn't appear to be wasting significant time on directories that do not contain OCaml code.
Work Breakdown
-
(days) !13396 (merged) slim mode -
(hours) proof of concept: patch the manifest to be able to easily deactivate old protocols -
(hours) patch the build system -
(hours) measure the time difference to conclude whether it is actually worth it
-
-
(hours) improve DX: ensure one will not commit the slim mode by mistake -
(hours) communicate about the slim mode so that engineers actually use it
-
-
(hours) !13473 (merged) use slim mode in the CI instead of the current ad-hoc rm -rf
? -
(days) investigate whether we can improve how we store commit hashes for --version
-
(hours) possibly by using a reference that is set by a module that is only linked at the end? - seems to have no impact:
dune build src/bin_node
takes 12s with or without this idea - those 12s seems to be split into:
- dune discovering what has to be done (about 7s, also the case if we change nothing)
- linking the executable (5s)
- we are already using the
opaque
flag which should have the same effect than this idea anyway - conclusion: give up on this idea
- seems to have no impact:
-
(hours) investigate dune-configurator
or whatever it was called?- actually called
dune-build-info
- but seems a bit limited: only provides a version number, no commit hash (according to its cram tests)
- conclusion: giving up on this idea for now
- actually called
-
(hours) other ideas if the above fail? - one idea that proved to work well in the past was to replace a magic number in the generated executable directly, but this may only work on some OSes (it worked on Linux)
- this is what
dune-build-info
does - we can't easily do our own
dune-build-info
, I think, because its documentation says that it requires help (read: hacks) in dune - we'd probably not gain that much if the limiting factors are dune discovery and linking anyway
- conclusion: giving up on this idea for now
-
-
(hours) investigate whether adding some directories to data_only_files
would help dune- see experiment results above
- conclusion: no significant time is wasted here, probably not worth it given that it would be annoying if we wanted to actually put OCaml code in such directories
-
(days) !13561 (merged) opaque mode to compile with -opaque
to gain time when not modifying interface (at the cost of slower tests)-
(hours) measure the impact on build times in various scenarios -
(hours) measure the impact on performance by running tests -
(hours) find a way to make it not too invasive if possible (like the slim mode) -
(hours) document and communicate
-
-
(days) investigate dune build
-
(hours) run dune build
in sequential mode with timestamped verbose logs -
(hours) analyze those logs -
(days) if there are outliers that are particularly long to compile, investigate those in particular -
(days) may also identify some dependencies that trigger the recompilation of more files than necessary -
(days) may want to patch dune itself to understand where it spends the initial 12 seconds
-