Skip to content

Improve handling of frontend crashes

This branch makes the core a bit more defensive of how frontends are implemented in BuildStream, and handles crashes of BuildStream frontends more gracefully (even though we only have one frontend right now, the BuildStream core doesn't know that... it will just be our little secret for now).

Specifically, this branch:

  • Improves fault tolerance around killing jobs

  • Ensures that jobs are started and added to the active_jobs list atomically (python does not support threading in any way which would make these two lines not be atomic anyway due to the GIL, so starting the job after adding it to the list of jobs is effectively an atomic operation as far as it matters).

  • Ensures that we schedule jobs with the expectation that starting a job might result in a job being forcefully terminated due to a possibly crash in any frontend implementations handling notifications.

This branch fixes #1312 (closed)

Unfortunately, I am not really able to provide a test case for this at this time, however if we improve/reduce the frontend facing API surface and define that better, it could make sense to add a test suite consisting of phony frontend implementations which act badly in various ways, triggering errors which we cannot otherwise test for.

Merge request reports