Improve scripts for adding a new architecture target to an existing EESSI version

In #145 several attempts where made to improve the reproducibility of the stack in such a way that it would be easier to add e.g. a new CPU architecture to a given EESSI version.

This essentially ended with

  • scrips to somewhat facilitate this, but that were never finished/merged https://github.com/EESSI/software-layer/pull/1035 and that weren't fully automated (manual intervention was needed)
  • an EasyBuild hook that makes sure reprod directories are copied to /cvmfs/software.eessi.io/versions/$EESSI_VERSION/software/<os>/$EESSI_SOFTWARE_SUBDIR/reprod/<software_name>/<software_version>/<datestamp>

The latter allows us to create improved rebuild scripts. Here, I describe how we might design those scripts. This is under the assumption that we don't replay rebuilds, but instead install the last build, yet at the original build time.

  1. Take a reference architecture/EESSI_SOFTWARE_SUBDIR as argumenet
  2. It should create a list of all software available, including versions (e.g. by crawling the /cvmfs/software.eessi.io/versions/$EESSI_VERSION/software/<os>/$EESSI_SOFTWARE_SUBDIR/reprod/<software_name>/<software_version> directories)
  3. For each item in the list, it should get the following information: a. Date/Time at which the software was build initially (should be extracted from the name of the first datestamped folder within /cvmfs/software.eessi.io/versions/$EESSI_VERSION/software/<os>/$EESSI_SOFTWARE_SUBDIR/reprod/<software_name>/<software_version>/<datestamp>) b. Total build time. Since rebuilds may be done with --module-only, it's probably best to get this from the build log of the first build (i.e. same folder as in (a)). c. EasyBuild version. Since we want to essentially do only the last build, the EasyBuild version should be obtained from the build log of the last build. If the software being installed is EasyBuild itself, it may have been bootstrapped from a temporary installation. This would lead to the conclusion that we should install EB version X.Y.Z with EB version X.Y.Z. That's obviously a chicken-and-egg problem. Instead, the script should take an argument to override which EB version is used to install other EB's. d. Toolchain and toolchain version. Should be obtained for the last build. e. The paths to the easyblock that was used for the last installation. E.g. /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen2/reprod/UCX/1.16.0-GCCcore-13.3.0/20251120_201520UTC/easybuild/reprod/easyblocks/*.py f. The path to the easyconfig file that was used for the last installation. his should be the one from the easybuild directory, not the easybuild/reprod directory (see #145 (comment 2469112712)). E.g. /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen2/reprod/UCX/1.16.0-GCCcore-13.3.0/20251120_201520UTC/easybuild/UCX-1.16.0-GCCcore-13.3.0.eb. I think patches are found because EB automatically searches for patches in the dir in which the easyconfig also resides.
  4. It should order the list of all this software chronologically, as builds should be done in the same order for the new target. We may need to make an exception for EasyBuild and just always do these first - I'm not 100% sure.
  5. In order, the elements of the list should be written to an easystack file XXX-eb-<ebversion>.yml, where XXX is a sequence number. As soon as a software is encountered that uses a different easybuild version compared to the previous, the sequence number should be increased by one so that a new easystack file will be created. Additionally, the script should take an argument that sets the desired maximum build time per easystack file. As soon as this is exceeded, the sequence number should also be increased. To avoid infinite looping, this should only be done if there is at least 1 item in the current EasyStack file already (which would happen if the desired maximum build time is lower than the build time for one particular individual piece of software).

Alternatively, if we want to replay rebuilds as well, we should adapt point 3 so that it simply extracts all the information for the current datestamped folder, and just loop over each. If e.g. software X was build 3 times, this would cause it to be in the list 3 times, at 3 different times, with (potentially) different EB versions, and pointing to use different easyconfigs and easyblocks. This approach would consume more build time (thought the amount of rebuilds is usually limited), but is more accurate, and the resulting script would be slightly more straigthforward.

A final remark is that when adding a new architecture, for some cases you may want to use the latest EasyBuild. E.g. for zen5 there are patches that add zen5 support to OpenBLAS. This may require an option to override the specific EB version for that particular piece of software, and then also use the upstream easyconfig & easyblock instead of the one from the reprod dir. We will not implement such an option in the first iteration of those scripts though.

An example of how this played out with the old scripts is here

Edited Feb 16, 2026 by Caspar van Leeuwen
Assignee Loading
Time tracking Loading