Investigate what is needed to support 'spawning' the scheduler as a subprocess

Background

This is related to two main strands of work, running the UI in a separate process to the scheduler #1036 (closed) & #1021 for BuildStream to support native Windows.

The PoC !1613 (closed) of running bst build in a python multiprocess is in a WIP state, however it uses fork which is the default operation in unix. spawn is also supported on unix, but comes with a known performance hit. To be able to potentially gain the benefits of having the frontend operate under a separate process natively on Windows spawn needs to be supported. The main issue with python's multiprocessing spawn implementation is that is a new interpreter to be launched, and everything that is passed to run to be pickle-able (Note, we also forcefully stop Stream & Scheduler references from being dumped by pickle at all).

Whilst work has been taken to rework/avoid having to pickle objects such as Elements across the implemented multiprocessing notification queues to handle the abstraction across the Stream-Scheduler process split, the actual method of running the entry points (in this case, a command such as bst build) in a pickle-able way for spawn needs to be assessed. As it stands there is no clear way to achieve this, as in essense the whole 'backend process' state of buildstream needs to be spawned between the processes.

Task description

Investigate what can be initialised/computed after the point of spawning in the subprocess, thus reducing pickling overheads
Look at using the __getstate__ override in stream()/scheduler() to modify the the object __dict__ essentially returning only what is needed/pickle-able to the dumper whilst un-pickling
Assess the implementation of jobpickler.py, which is used to transform our child_jobs into a pickle-able state, in which the scheduler will then execute them in a spawned subprocess. Some of this internal tinkering is probably transferable.

Acceptance Criteria

An eventual report on the tasks needed to support spawn as the multiprocess implementation between the front & back ends.

Edited Oct 07, 2019 by Tom Pollard

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information