Gabriel Mazetto requested to merge brodock/dependency-tree into 7.0.10-stable Mar 22, 2021

Description

This is an experimental feature to introduce a dependency tree to Omnibus. With a dependency tree we should be able to build a cache that uses that in order to invalidate only the pieces that changed, instead of everything else in current "single stack" approach.

Current approach

Current cache approach is similar to Docker layered cache. If a layer down in the stack changes, everything else needs to be rebuild.

Current cache layout

It starts with an empty repository, and for each item in the linearized dependency list, it gets a new commit, and that commit gets tagged with a certain format. The format takes among other things the "software name" a hash of the dependency steps executed already and a cache version number. This allows it to fingerprint the exact "state" of the build, for example, if something changes, the hash will be different and therefore can't be reused etc.

Because it relies on the lineralizable dependency list, if something very early changes, everything after it is purged

New proposal

With a dependency tree, we can build each "leaf dependency" in isolation, cache those, than move one step up, cache those again and keep moving and keep caching each trunk in separation before and after merging them together.

A easy way to picture this is as if we were building each "dependency" as its own separate package, which we then reuse when that dependency is required.

In the separate package approach, as long as the dependency is a "runtime" one, you could easily just "rebuild that" and reuse the rest. This is one of the goals here.

A different approach that could be possible is, when a dependency is a "build" one, as long as what is being compiled is also not a "build" dependency for something else, we could again, just isolate and rebuild that part of the dependency tree while reusing the remaining "cached" version.

The risk here is being mitigated by reusing existing linearize algorithm to implement the "merge" order (retrieve from cache each piece, using that order, before "merging" them, as omnibus does today).

New cache proposal

Instead of a single "root" for all the builds, each leaf component will have it's own initial branch (from zeroed content). We will keep building and saving state there until that component is fully built.

Part of the challenge is to sort for the leafs and build those first.

Here is an example:

graph BT
B --> A
C --> A
D --> A
E --> D

In this example we would have distinct branch for for B, C and E starting from an empty state (no previous commit). Then we would have D branch start with E as it's root, reusing everything included there. Then we would finally have A merging B, C and D (using the order from the current lineraziable algorithm, so whatever order it suggests to use there, we would use here), which should give us exact same results, as long as no step leaves it in "dirty" state.

What this allow us it, for example, if C changes, only C needs to be rebuild, all the other can be reconstructed from their own cache, and after we re-build C, to rebuild A we would be able to reuse B D (and E).

Expected results

Depending on how much we will be able to solve the cache problem or isolate the components, I expect that the cache will be almost always "hot", which means we will have a rebuild time as short as possible, most of the time.

Edited Mar 23, 2021 by Gabriel Mazetto

Draft: Experimental Dependency Tree