WIP: Run bst build under a multiprocessing subprocess (!1613) · Merge requests · BuildStream / buildstream

Tom Pollard requested to merge tpollard/buildsubprocess into master Sep 27, 2019

Description

This is currently a PoC implementation to show bst build running under pythons multiprocessing subprocess, utilising Queues to handle notifications been the 'front' & 'back' end process. The tests currently pass, thought needs to be given on the merge path for this workstream, as well as ensuring there's benefit (i.e increased performance) for doing so. The subprocess handling & notification polling is very rudimentary, and needs to be fully assessed. This is part of #1036 (closed). The main aim is to move the overheads of calling click.echo into a seperate process to that of the scheduler, hopefully allowing for more 'work' to be done leading to quicker buildtimes on WSL by utilising more CPU time across processes.

Known issues:

widget.py print summary needs access to resolved keys at the ends of the run where keys have been acquired, e.g tracked .
shelling into a failed build element will not work as effectively as possible, although previous MR's have made it more achievable. The element should be able to be loaded from the given unique_id which the frontend process can pass back into the subprocess. The main issue here is that the Stream that exists in the 'backend process' needs to load the shell, where as the current implementation can only notify the Scheduler in the 'backend process' from the 'frontend process'. For now I've added a commit to load the element instance via it's name if the plugintable method does not work, which 'works'.
Currently the last exception in some cases is stored against the frontend context object, so then it can be used to further processing once the stream has returned (in App). This approach uses tblib, to make the traceback object of the exception picklable. Where exactly we `install()' this functionality is a question however.
Interrupt handling 'works' on Linux, with ctrl-c bringing up the interrupt handler, and entering the given options then proceeds as expected. (On linux) This now behaves as master does.

Changes proposed in this merge request:

Add in additional notifications to enable test suite to pass
Switch the already implemented deque for two multiprocessing queues, used for bidirectional notification
async event loop created in the 'frontend process' when subprocess is started, this event loop is used to poll notification queues, and watch the casd process if appropriate.
with async in the frontend we can also add watchers for the user interrupt, currently this has been set to handle the same SIG that are handled in the scheduler subprocess (which still handles it's own) however it might make sense to only handle ctrlC in the frontend
Explicitly don't source push if the related element failed to build, this was implied by the ordering of queues & comments in the test suite, but was not actually enforced.

Current Analysis

As it stands, the branch shows a slight added overhead to the Linux build ~4-6% depending on with or without a terminal connected. This is somewhat to be expected and to some extent justifiable, as the terminal performance & bottleneck is not perceived as a known bottleneck on this platform. The WSL terminal performance however, seeing a decrease in build time of ~30% is promising, if it can be shown to be reproducible across WSL target platforms, with maintainable codepaths

Latest benchmark, on WSL with a terminal (second pass, ~30% decrease in real time):

time bst --builders 4 --cache-buildtrees never build base-files/base-files.bst

tpollard/buildsubprocess
real    19m35.948s
user    12m25.234s
sys     31m52.188s

master
real    28m8.645s
user    7m22.438s
sys     19m42.078s

Latest benchmark, on Linux (Ubuntu 19.10) with a terminal (second pass, ~4% increase in real time):

time bst --builders 4 --cache-buildtrees never build base-files/base-files.bst

tpollard/buildsubprocess
real	6m53.579s
user	7m31.931s
sys	6m11.373s

master
real	6m34.438s
user	6m47.821s
sys	5m51.407s

Latest benchmark, on Linux (Ubuntu 19.10 running under a script (no terminal attached, so click.echo() printing bottleneck irrelevant ~6% increase to buildtime). This benchmark is used as a relevancy check to ensure the overheads of adding the subprocessing machinary isn't hitting the none 'frontend output' case too harshly, i.e running in CI or under a buildscript:

- Finished in: 13 mins, 16 secs                                                                                                                                                                                    
- Number of builders: [8]                                                                                                                                                                                          
- Target files: base-files/base-files.bst                                                                                                                                                                          
- Number of runs: 1
- Number of warmups: 1
- Python versions: py37

| action        | python_version   | commit                              |   median time (secs) | mean time (secs) ± std   |
|:--------------|:-----------------|:------------------------------------|---------------------:|:-------------------------|
| build - 8     | py37             | master - 5c2fe471                   |               178.85 | 178.85 ± nan             |
|               | py37             | tpollard/buildsubprocess - 86764109 |               189.87 | 189.87 ± nan             |
| show          | py37             | master - 5c2fe471                   |                 5.9  | 5.90 ± nan               |
|               | py37             | tpollard/buildsubprocess - 86764109 |                 5.46 | 5.46 ± nan               |
| show - cached | py37             | master - 5c2fe471                   |                 6.01 | 6.01 ± nan               |
|               | py37             | tpollard/buildsubprocess - 86764109 |                 6.01 | 6.01 ± nan               |

| action        | python_version   | commit                              |   median max memory (MB) | mean max memory (MB) ± std   |
|:--------------|:-----------------|:------------------------------------|-------------------------:|:-----------------------------|
| build - 8     | py37             | master - 5c2fe471                   |                  197.691 | 197.69 ± nan                 |
|               | py37             | tpollard/buildsubprocess - 86764109 |                  192.539 | 192.54 ± nan                 |
| show          | py37             | master - 5c2fe471                   |                  200.57  | 200.57 ± nan                 |
|               | py37             | tpollard/buildsubprocess - 86764109 |                  172.379 | 172.38 ± nan                 |
| show - cached | py37             | master - 5c2fe471                   |                  200.543 | 200.54 ± nan                 |
|               | py37             | tpollard/buildsubprocess - 86764109 |                  200.707 | 200.71 ± nan                 |

Edited Nov 28, 2019 by Tom Pollard

WIP: Run bst build under a multiprocessing subprocess

Description

Current Analysis

Merge request reports