Test eventlet/greenthread pool for workers for SSH connection pooling

The workers are either memory bound (if the engines run directly in the workers (LocalExecutor)) or (probably) I/O bound (if the engines run remotely (SshExecutor, LsfExecutor, SlurmExector)).

We previously tested a prefork pool in the workers for running multiple Nextflow processes locally. This reduced the memory consumption of the workers compared to running one worker process/thread per container. Note that we did not identify whether this was because the workers themselves needed less memory, or whether the Nextflow processes shared memory (e.g. for the libraries). Another important point is that SSH connections cannot be shared between OS threads (started with "multiprocessing" by Celery). Also the asyncio event-loop is not shared between threads.

For I/O bound processes, another option may be to use eventlets or greenthreads (https://www.distributedpython.com/2018/10/26/celery-execution-pool/). The advantage of this would be that we could share the SSH connections between different run_command invocations running in the same Python process. Thus it may be possible to have a limited number of SSH connections (e.g. one per single-threaded worker container (i.e. e.g. 64 for a 64 core Swarm node) but share it between many submissions (e.g. running 8 greenthreaded submissions per worker would result in 512 parallel runs of workflows (e.g. bwait).

Add gevent to the conda environment of the worker (gevent seems more popular than eventlet)
Add --pool gevent to the worker start-up command
Have a look here for a template with a DB connection as example. Maybe we can implement this in a similar way.

Edited Jun 01, 2022 by Philip Reiner Kensche