Let dantro take care of setting up a dask client, scheduler, etc.
Inspired by the preview of dask @peanutfun gave me, I'm wondering: Would it make sense to let dantro take care of setting up all the dask objects necessary to perform data transformations?
As far as I see, this would "only" require to set up a distributed.Client
at a sensible point and allow configuring its parameters. At teardown, the client should also be closed.
Questions:
- Would this be useful to include into dantro directly?
- If so, where should this client live and how should it be initiated? Can the number of workers be changed afterwards? I could imagine the following:
- There is a single global
dantro._dask.Client
, which can be started from anywhere and which retains the initially set up properties. The client would be closed only upon program shutdown (the module going out of scope).- What should happen if we require one set of parameters (e.g. number of workers) at one point and other parameters at another point? Should multiple clients be possible? Should the "latest" client set itself to be the default client?
- The client is local to where its needed, e.g. to the
TransformationDAG
.- While this seems like the logical spot, there is the strong downside: its lifetime would be limited to that of a single plot task (or transformation task, after #338). This seems unpractical.
- There is a single global
@peanutfun What are your experiences so far? Could you perhaps briefly comment on the questions above and what you'd consider a sensible implementation? Is there other stuff needed to get dask to work aside from the Client
?