Let PlotManager use multiprocessing
As brought up in utopia#56 (moved), the plotting framework might greatly benefit from multiprocessing… if done right.
There are a bunch of things to look out for:
- Sharing data will not bring a large performance benefit, because we would fall back to threading and have the GIL block actual parallel execution.
- Plotting with several processes has a performance benefit, but the disadvantage that data potentially has to be loaded multiple times and occupies memory multiple times…
Open Questions
- At which level should multiprocessing set in?
- The straight-forward level would be that of the plot configuration, i.e. each
PlotManager.plot
call being run in a separate process. - An alternative would be that of the
PlotManager._plot
call, which would also support plots fromParamSpace
plot configs in their own processes. - Letting each creator invocation run in its own process is probably the best option…
- The straight-forward level would be that of the plot configuration, i.e. each
- Individual processes or pool workers?
- Pool workers, definitely. These could be fed with "plot tasks" via a queue. Ideally, the queues would be populated in a smart fashion, such that plots on similar data (we can assume this for
ParamSpace
-plots) are preferentially grabbed by the same worker.
- Pool workers, definitely. These could be fed with "plot tasks" via a queue. Ideally, the queues would be populated in a smart fashion, such that plots on similar data (we can assume this for
- How should the data tree object be handled? Is there any way to share it between processes?
- This is the difficult part. Probably it's not easy to share the data or pipe it back and forth …
- For DAG-based plots, we would only need to pass the DAG result to the process instead of the whole tree – but ...
- that's rather late in the plotting process
- does not cover all plotting cases
- leaves all the potentially heavy computation in the parent process
- ... so that's not really an option.
- Having separate data trees is probably the most convenient approach. This might need a lot of memory, but if configurable by the user, it should be ok...? Also, #72 could help to free resources in the individual processes.
- Other things to figure out:
- Logging and user-communication
- File conflicts?