SRun only uses a single core by default

Summary

When submitting a ml model training with the mantik compute backend API, only one core is used to execute the job by default. This behaviour cannot be overwritten in our backend config.

Steps to reproduce

Submit a run to JUWELS with code that uses parallelization
Monitor performance with llview

What is the current bug behavior?

Only a single core is being used for training of a parallelized model.

Details

There has been a change to the behaviour of srun command:

Information - New Slurm version 22.05!
On 28. Feb Slurm has been upgraded to version 22.05.

Important changes from 21.08 to 22.05:
  - srun will no longer read in SLURM_CPUS_PER_TASK and will not inherit option
    --cpus-per-task from sbatch! This means you will explicitly have to specify
    --cpus-per-task to your srun calls, or set the new SRUN_CPUS_PER_TASK env
    var. If you want to keep using --cpus-per-task with sbatch then you will
    have to add: "export SRUN_CPUS_PER_TASK=${SLURM_CPUS_PER_TASK}".
  - Using the option --cpus-per-task in 22.05 does imply --exact, which
    means that each step with --cpus-per-task will now only get the minimum
    number of cores. The pinning will change (implication on the performance)
    and the tasks will fill the HW threads of same cores. If you don’t use SMT
    and want to keep old behavior as before where your threads run only on
    real cores then add this to srun: "--threads-per-core=1".

(from JUWELS ssh welcome message)

We create a batch script that then uses srun singularity run .... At the moment there is no way to pass flags to srun.

Relevant logs and/or screenshots

Logs

PASTE LOGS HERE

What is the expected correct behavior?

srun flags can be passed
parallelized models make use of multiple cores on JUWELS

Possible fixes

Extend backend config to include srun flags
Add those flags when building the batch script for UNICORE