Skip to content

Element._update_state() does more stuff than it needs to when it's called

Summary

Element._update_state() is a very long function that does a lot of work and is called frequently, often for very different reasons.

The full list of reasons are:

  • Pipeline.resolve_elements(), so that we know the initial state of the elements.
  • Stream._run(), so that we know the state of the elements after a run (to provide a useful summary)
  • {Build,Fetch,Pull}Queue.status(), in case changes to dependencies have changed the element's state. Used by Queue.harvest_jobs() to be sure whether the job should still be run.
  • FetchQueue.done(), to check whether the element is now cached
  • Element._schedule_tracking(), to set the element state to be inconsistent
  • Element._tracking_done(), to set the cache key now the element has been tracked.
  • Element._set_required(), to schedule assembly if certain conditions are met.
  • Element._schedule_assemble(), to synchronise the element state before it gets used in a subprocess.
  • Element._assemble_done(), to synchronise the element state after the subprocess has done jobs.
  • Element._pull_done(), to synchronise the element state after a pull attempt.

i.e. the purposes are broadly:

  • Calculating the initial state
  • Scheduling assembly
  • Deciding whether a job should be run
  • Deciding whether the whole dependency tree should be recalculated
  • Synchronising the state because of things that have happened / will happen in a subprocess

Possible fixes

Given a lot of work is done in Element._update_state(), there is likely to be a performance improvement if we can reduce the amount of redundant work that is done each time we call Element._update_state().

Other relevant information

  • BuildStream version affected: /milestone %BuildStream_v1.x

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information