Investigate what is needed to support 'spawning' the scheduler as a subprocess
Background
This is related to two main strands of work, running the UI in a separate process to the scheduler #1036 (closed) & #1021 for BuildStream to support native Windows.
The PoC !1613 (closed) of running bst build in a python multiprocess is in a WIP state, however it uses fork which is the default operation in unix. spawn is also supported on unix, but comes with a known performance hit. To be able to potentially gain the benefits of having the frontend operate under a separate process natively on Windows spawn needs to be supported. The main issue with python's multiprocessing spawn implementation is that is a new interpreter to be launched, and everything that is passed to run to be pickle-able (Note, we also forcefully stop Stream & Scheduler references from being dumped by pickle at all).
Whilst work has been taken to rework/avoid having to pickle objects such as Elements across the implemented multiprocessing notification queues to handle the abstraction across the Stream-Scheduler process split, the actual method of running the entry points (in this case, a command such as bst build) in a pickle-able way for spawn needs to be assessed. As it stands there is no clear way to achieve this, as in essense the whole 'backend process' state of buildstream needs to be spawned between the processes.
Task description
- Investigate what can be initialised/computed after the point of spawning in the subprocess, thus reducing pickling overheads
- Look at using the
__getstate__override in stream()/scheduler() to modify the the object__dict__essentially returning only what is needed/pickle-able to the dumper whilst un-pickling - Assess the implementation of
jobpickler.py, which is used to transform ourchild_jobsinto a pickle-able state, in which the scheduler will then execute them in a spawned subprocess. Some of this internal tinkering is probably transferable.
Acceptance Criteria
An eventual report on the tasks needed to support spawn as the multiprocess implementation between the front & back ends.