CMake picking up on Python from compat layer during configure
We've seen in this in several cases:
And now Tim (SURF) has encountered it for both casacore and AOFlagger in this PR.
It made us wonder why this was going wrong. Tim checked the PATH:
[EESSI pilot 2023.06] $ echo $PATH
/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/Python/3.10.4-GCCcore-11.3.0/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/SQLite/3.38.3-GCCcore-11.3.0/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/Tcl/8.6.12-GCCcore-11.3.0/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/HDF5/1.12.2-gompi-2022a/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/GSL/2.7-GCC-11.3.0/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/FFTW/3.3.10-GCC-11.3.0/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/FlexiBLAS/3.2.0-GCC-11.3.0/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/OpenMPI/4.1.4-GCC-11.3.0/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/UCC/1.0.0-GCCcore-11.3.0/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/PMIx/4.1.2-GCCcore-11.3.0/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/libfabric/1.15.1-GCCcore-11.3.0/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/UCX/1.12.1-GCCcore-11.3.0/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/libevent/2.1.12-GCCcore-11.3.0/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/OpenSSL/1.1/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/hwloc/2.7.1-GCCcore-11.3.0/sbin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/hwloc/2.7.1-GCCcore-11.3.0/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/libxml2/2.9.13-GCCcore-11.3.0/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/numactl/2.0.14-GCCcore-11.3.0/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/GCCcore/11.3.0/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/EasyBuild/4.8.2/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/compat/linux/x86_64/usr/bin:/cvmfs/pilot.eessi-hpc.org/versions/2023.06/compat/linux/x86_64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
That is completely correct. Then, we inspected the CMake logic for findPython here. It says:
Python_FIND_STRATEGY.. versionadded:: 3.15
This variable defines how lookup will be done. The
Python_FIND_STRATEGYvariable can be set to one of the following:
VERSION: Try to find the most recent version in all specified locations. This is the default if policy :policy:CMP0094is undefined or set toOLD.LOCATION: Stops lookup as soon as a version satisfying version constraints is founded. This is the default if policy :policy:CMP0094is set toNEW.
Ok, so that is the problem: Cmake's Python_FIND_STRATEGY is VERSION under the default policy (OLD). That means: it looks for the newest version of python it can find, and uses that. The LOCATION strategy (which is the default if the policy CMP0094 is set to New) says it stops lookup if a version satisfying the constraints is found. This suggests (though it is not explicit) that it will respect the order in which things appear on PATH(?). Note that according to this when the policy is unset the default strategy is VERSION, i.e. it will pick up on the newest version - causing the issues we are seeing.
I'm wondering if we can somehow change that default behavior, and make sure that when policy CMP0094 is unset, it uses LOCATION by default. Or, alternatively, if we can (by default) set the CMP0094 policy to NEW. Ideally, we would modify the CMake installations (EasyBlock?) to do this. I think even though we've noticed this problem in EESSI, it should also occur for regular EasyBuild users whenever they have a newer system python than the one used in their module stack as dependency.
We should think about a global solution to this issue. I'd much rather modify all cmake installations (or the CMakeMake easyblock) than do this for each cmake-based installation individually. It is extremely easy to miss: AOFlagger for example installed fine, Tim just happened to have seen in the config logs that it picked up on the Python from the compat layer.