Skip to content

Let PlotManager use multiprocessing

As brought up in utopia#56 (moved), the plotting framework might greatly benefit from multiprocessing… if done right.

There are a bunch of things to look out for:

  • Sharing data will not bring a large performance benefit, because we would fall back to threading and have the GIL block actual parallel execution.
  • Plotting with several processes has a performance benefit, but the disadvantage that data potentially has to be loaded multiple times and occupies memory multiple times…

Open Questions

  • At which level should multiprocessing set in?
    • The straight-forward level would be that of the plot configuration, i.e. each PlotManager.plot call being run in a separate process.
    • An alternative would be that of the PlotManager._plot call, which would also support plots from ParamSpace plot configs in their own processes.
    • Letting each creator invocation run in its own process is probably the best option…
  • Individual processes or pool workers?
    • Pool workers, definitely. These could be fed with "plot tasks" via a queue. Ideally, the queues would be populated in a smart fashion, such that plots on similar data (we can assume this for ParamSpace-plots) are preferentially grabbed by the same worker.
  • How should the data tree object be handled? Is there any way to share it between processes?
    • This is the difficult part. Probably it's not easy to share the data or pipe it back and forth …
    • For DAG-based plots, we would only need to pass the DAG result to the process instead of the whole tree – but ...
      • that's rather late in the plotting process
      • does not cover all plotting cases
      • leaves all the potentially heavy computation in the parent process
    • ... so that's not really an option.
    • Having separate data trees is probably the most convenient approach. This might need a lot of memory, but if configurable by the user, it should be ok...? Also, #72 could help to free resources in the individual processes.
  • Other things to figure out:
    • Logging and user-communication
    • File conflicts?
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information