WIP: Run bst build under a multiprocessing subprocess
Description
This is currently a PoC implementation to show bst build
running under pythons multiprocessing subprocess
, utilising Queues
to handle notifications been the 'front' & 'back' end process. The tests currently pass, thought needs to be given on the merge path for this workstream, as well as ensuring there's benefit (i.e increased performance) for doing so. The subprocess handling & notification polling is very rudimentary, and needs to be fully assessed. This is part of #1036 (closed). The main aim is to move the overheads of calling click.echo
into a seperate process to that of the scheduler, hopefully allowing for more 'work' to be done leading to quicker buildtimes on WSL by utilising more CPU time across processes.
Known issues:
-
widget.py
print summary needs access to resolved keys at the ends of the run where keys have been acquired, e.g tracked . - shelling into a failed build element will not work as effectively as possible, although previous MR's have made it more achievable. The element should be able to be loaded from the given unique_id which the frontend process can pass back into the subprocess. The main issue here is that the Stream that exists in the 'backend process' needs to load the shell, where as the current implementation can only notify the Scheduler in the 'backend process' from the 'frontend process'. For now I've added a commit to load the element instance via it's name if the plugintable method does not work, which 'works'.
- Currently the last exception in some cases is stored against the frontend context object, so then it can be used to further processing once the stream has returned (in App). This approach uses tblib, to make the traceback object of the exception picklable. Where exactly we `install()' this functionality is a question however.
- Interrupt handling 'works' on Linux, with ctrl-c bringing up the interrupt handler, and entering the given options then proceeds as expected. (On linux) This now behaves as master does.
Changes proposed in this merge request:
- Add in additional notifications to enable test suite to pass
- Switch the already implemented deque for two multiprocessing queues, used for bidirectional notification
- async event loop created in the 'frontend process' when subprocess is started, this event loop is used to poll notification queues, and watch the casd process if appropriate.
- with async in the frontend we can also add watchers for the user interrupt, currently this has been set to handle the same SIG that are handled in the scheduler subprocess (which still handles it's own) however it might make sense to only handle ctrlC in the frontend
- Explicitly don't source push if the related element failed to build, this was implied by the ordering of queues & comments in the test suite, but was not actually enforced.
Current Analysis
As it stands, the branch shows a slight added overhead to the Linux build ~4-6% depending on with or without a terminal connected. This is somewhat to be expected and to some extent justifiable, as the terminal performance & bottleneck is not perceived as a known bottleneck on this platform. The WSL terminal performance however, seeing a decrease in build time of ~30% is promising, if it can be shown to be reproducible across WSL target platforms, with maintainable codepaths
Latest benchmark, on WSL with a terminal (second pass, ~30% decrease in real time):
time bst --builders 4 --cache-buildtrees never build base-files/base-files.bst
tpollard/buildsubprocess
real 19m35.948s
user 12m25.234s
sys 31m52.188s
master
real 28m8.645s
user 7m22.438s
sys 19m42.078s
Latest benchmark, on Linux (Ubuntu 19.10) with a terminal (second pass, ~4% increase in real time):
time bst --builders 4 --cache-buildtrees never build base-files/base-files.bst
tpollard/buildsubprocess
real 6m53.579s
user 7m31.931s
sys 6m11.373s
master
real 6m34.438s
user 6m47.821s
sys 5m51.407s
Latest benchmark, on Linux (Ubuntu 19.10 running under a script (no terminal attached, so click.echo() printing bottleneck irrelevant ~6% increase to buildtime). This benchmark is used as a relevancy check to ensure the overheads of adding the subprocessing machinary isn't hitting the none 'frontend output' case too harshly, i.e running in CI or under a buildscript:
- Finished in: 13 mins, 16 secs
- Number of builders: [8]
- Target files: base-files/base-files.bst
- Number of runs: 1
- Number of warmups: 1
- Python versions: py37
| action | python_version | commit | median time (secs) | mean time (secs) ± std |
|:--------------|:-----------------|:------------------------------------|---------------------:|:-------------------------|
| build - 8 | py37 | master - 5c2fe471 | 178.85 | 178.85 ± nan |
| | py37 | tpollard/buildsubprocess - 86764109 | 189.87 | 189.87 ± nan |
| show | py37 | master - 5c2fe471 | 5.9 | 5.90 ± nan |
| | py37 | tpollard/buildsubprocess - 86764109 | 5.46 | 5.46 ± nan |
| show - cached | py37 | master - 5c2fe471 | 6.01 | 6.01 ± nan |
| | py37 | tpollard/buildsubprocess - 86764109 | 6.01 | 6.01 ± nan |
| action | python_version | commit | median max memory (MB) | mean max memory (MB) ± std |
|:--------------|:-----------------|:------------------------------------|-------------------------:|:-----------------------------|
| build - 8 | py37 | master - 5c2fe471 | 197.691 | 197.69 ± nan |
| | py37 | tpollard/buildsubprocess - 86764109 | 192.539 | 192.54 ± nan |
| show | py37 | master - 5c2fe471 | 200.57 | 200.57 ± nan |
| | py37 | tpollard/buildsubprocess - 86764109 | 172.379 | 172.38 ± nan |
| show - cached | py37 | master - 5c2fe471 | 200.543 | 200.54 ± nan |
| | py37 | tpollard/buildsubprocess - 86764109 | 200.707 | 200.71 ± nan |