Improve reproducibility of stack
While (re)building the stack for Zen 4, Sapphire Rapids, and Grace, we noticed that this task was much harder than expected. See https://gitlab.com/eessi/support/-/wikis/building_add_new_cpu_target#warnings--remarks--lessons-learned for some more details.
In order to improve this for the next version of the stack, we should consider ways to make this easier. Ideally, it would be nice if the stack can be easily rebuilt in some way with a chronologically order easystack/build script that also includes all the required rebuilds, and which makes it easy to override certain versions with an updated one (e.g. when an additional fix/patch is required for this new CPU target).
Some ideas that we had and which were discussed on Slack:
-
Use
--from-commit
everywhere. This makes it even clearer which easyconfig was used and makes us less dependent on the EasyBuild version. The same should also be done for easyblocks then, though.- Is there a way to automate this? Having to do it manually is too tedious.
-
Extend easystacks with all individual dependencies, to prevent having to rely on
--robot
. By having every individual installation listed, it's easier to keep track of which EB version/commit was used for every installation. We could store these separate from the regular easystacks, and maybe create them automatically using a hook / some script.- Rebuilds would have to replace the original entries in an easystack in order to keep the order correct, as this particular installation may be a dependency of something else that gets built later on.
- We could even consider making one large easystack/yaml file, where every entry may contain additional details like the EasyBuild version. This would not directly work with EasyBuild, but we could easily write a script that parses this and generates real easystacks.
-
Create a script that will fetch all required build details from installation directories. I already have a script that can do most of this by taking the easyconfigs/easyblocks from
easybuild
subdirs, and grepping the EasyBuild version and build time from the easybuild log file. We can then sort them by build time and generate easystacks per EB version with chronologically order installations.- This approach almost did the trick for Sapphire Rapids, except that it's currently not possible to determine the original build time for rebuilt installations, as we've removed the original build log. If we could somehow retain this time stamp, we should have all the information to do an easy rebuild.
- The current implementation uses a two-step approach: it first creates json files with all the build details (already ordered by build time), allowing one to make manual modifications, then it converts those into a sequence of easystack files named like
001-eb-<eb_version>-<tooolchain>.yml
. - These scripts currently use an existing stack as ground truth, we could also consider storing this kind of build information in a central location (e.g. something like
/cvmfs/software.eessi.io/versions/2023.06/reprod
).
TLDR: we need something that stores the build information for all apps (+ rebuilds) included in the stack, and allows us to easily rebuild the entire stack for a new CPU.