add detailed info on how to help preemptable jobs get requeued
useful summary notes about requeueing preemptable jobs:
for the current Slurm a requeuing preemptable job needs to:
- Launch its main workload via srun so it will get the warning signals
- That workload launched via srun needs to trap either the requested signal given with --signal or SIGTERM if no --signal is used (assuming we enable PreemptParameters=send_user_signal)
- That workload needs to keep in mind the rules around bash and trapping signals if that is used - i.e. run the long running part in the background and wait for it.
- When that signal handler is triggered and the job has a warning about being preempted then it should clean itself up ready to be requeued
- It should then sleep 600 or similar in order to ensure it gets marked as preempted and is requeued.