recommend how to run non-MPI worker daemons on a Slurm cluster (#230) · Issues · Charliecloud / Charliecloud · GitLab

recommend how to run non-MPI worker daemons on a Slurm cluster

Many new and interesting programming frameworks require starting a worker daemon on each node in a cluster. The obvious choice is to use `srun`, but that does not work. Quoting from the tutorial regarding Apache Spark: > Next, start one worker on each compute node. This is a little ugly; `mpirun` will wait until everything is finished before returning, but we want to start the workers in the background, so we add `&` and introduce a race condition. (`srun` has different, even less helpful behavior: it kills the worker as soon as it goes into the background.) > > $ mpirun -map-by '' -pernode ch-run -b ~/sparkconf /var/tmp/spark -- \ > /spark/sbin/start-slave.sh $MASTER_URL & In this case, the script `start-slave.sh` will daemonize a worker child and then exit, at which point `srun` kills the worker. However, even if the script didn't daemonize itself, `srun` cannot be backgrounded with `&` because subsequent `srun` will wait for it to complete, even in the background. Use cases so far include Spark and FUSE. The `mpirun` workaround above is awkward because MPI shouldn't be needed simply to start a process on each node. Other workarounds include `pdsh`, GNU Parallel, and similar. Ideally, we would figure out how to make `srun` do what we want, since that's the native way to start processes on Slurm nodes. This issue is to figure out a recommendation and propagate it through the documentation and examples. See also #156 and #160. The former includes a reproducer script for the Spark behavior.

issue