Scheduler requires significant rework

Some Background

The thoughts I've written here have mostly developed from investigation into #712 (closed), in particular from dicsussions I've had with other contributers (the prime being @danielsilverstone-ct , WSalmon and @juergbi) while trying to reason about the perfomance impact of patches to the scheduler.

The Problem

1: Scaling

As it stands, the scheduler code does not scale well at all, the overhead it introduces becomes significant for large projects. Currently this overhead increases faster than linearly with the number of elements remaining in a pipeline.

@WSalmon created a nice demonstration of this using a simple set of empty import elements. Taking a pipeline of 5000 elements in a 'fan' graph from bstgen, the element throughput of the pipeline is almost an order of magnitude higher near the end than near the start. Note this holds even if one ignores either end to account for e.g loading time. On his mid-end laptop, this took over 20 minutes to run, without any network access and with builds which should be instantaneous. Profiling shows that over 80% of this time is spent in queue.pop_ready_jobs.

Issue #703 (closed) is one 'real world' encounter with this issue and my initial profiling observations support the assertion that job queues are taking a large amount of the total runtime.

Given we want BuildStream to be able to handle pipelines containing tens of thousands of elements, the scaling ability of the scheduler clearly needs improving.

2: Resource Starvation

As observed in #712 (closed). The current attempt at avoiding resource starvation by reversing the order of the queues in Scheduler._schedule_queue_jobs does not work. This is because all Pull jobs are ready to be scheduled immediately and therefore are guaranteed to be scheduled almost immediately. Any fetch jobs will be added to the ready_jobs list after the pull jobs and so will tend to wait until all pull jobs are complete.

Moving Forward

Refactoring

We're encountering enough pain here now that I believe it is worth thinking carefully about our approach to the scheduler as a whole. I've not yet spent much time thinking about alternate scheduler implementation so don't have much to offer on this front for now.

Benchmarking

As is demonstrated by #712 (closed) and #703 (closed), the synthetic benchmarks we have been using have not been representative of the large real world projects we are targeting. Before making any major changes we should ideally have some more realistic benchmarks which can be used to guide us.

@wsalmon has been doing some work on expanding bstgen to create more complex dependency trees. If we can show that these show the same performance issues as are being encountered in real projects then they could be used to perform easily repeatable, low cost benchmarks that we know reflect real world projects.

I believe @jennis has been working on constructing a benchmarking project from the Debian dependency graph, which could prove to be very valuable.

note: Additional links and more statistics to follow.

Edited Dec 20, 2018 by Daniel Silverstone

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information