Workspaces via CAS
Background
I am converting some notes from the October 2018 gathering into a ticket for safer keeping:
Proposal is to replace bind-mounting workspaces with synchronizing sources from the workspace directory to CAS and keeping object files only in CAS as part of the cached build tree. The plan is to implement this after BuildBox is available for local execution, either as part of or following the SourceCache effort. For compatibility, only do this for plugins that advertise BST_VIRTUAL_DIRECTORY.
This will eliminate the special cache key handling currently required for workspaces. I.e., it will no longer be needed to invalidate the cache key before a workspace build or recalculate the cache key after a workspace build.
This was also mentioned at https://mail.gnome.org/archives/buildstream-list/2019-July/msg00037.html
Implementation
This was refined in https://mail.gnome.org/archives/buildstream-list/2019-October/msg00000.html
For a local workspace build the run would proceed something like the following:
-
create a metasource for the workspace source plugin during element collection and allow that workspace source to own the element-specific source objects.
-
when getting the unique key of the workspace source, the workspace tree will be imported into the sourcecache and the unique key will be returned as the tuple
(path, digest)
. This will remove the segregation of workspaces and other sources when calculating the element cache key (#1073 (closed)). -
The source will be cached during staging and should be imported via casd into the remote cas to support RE. This should remove the need to clear workspace cache data (#1088 (closed)) and the need for the concept of unstable cache keys.
-
Supporting incremental builds will require
- node properties and integration into the buildbox infrastructure
- mechanism capable of efficiently producing diffs of source trees.
- mechanism capable of efficiently applying diffs of source trees
- mechanism to track immediate previous history of the workspace. The source digest of the previous workspace must be tracked but also the dependency hash (if that changes an incremental build is not possible), and the artefact ref of the previous workspace. These should be stashed regardless of the outcome of the previous build. If any of these are not available the build must be non-incremental
- mechanism to recover cached workspaces
-
The generic scheme for an incremental workspace build would then be (Given current workspace state
y
, and stored input statex => T_x
wheref(x) = T_x
)- verify that an incremental build is possible:
- the workspace config must provide a previous dependency hash, and cached artifact
- the input tree is a subset of the buildtree. If the build process removes or alters the input then this cannot be accurately reflected into the input tree for the incremental build and a full build is necessary. To be completely accurate we should still schedule the build even in the case that the buildtree and the input tree are identical since the buildsystem should render this a noop and we cannot know if there is any installation/other process scripted which is not reflected in the buildtree. A criteria of this is that
h^-1(T_x, x) = T_x
- the dependency hash of the current input tree must be equivalent to the previous dependency hash
- Apply the previous buildtree to the input tree:
h^-1(y, T_x) => y'
- Apply the process to that new input state:
f(y') = T_y'
wheref
is the build function,h
is the diff function
behavioural changes
- builds no longer affect the locally opened workspace so keys should no longer be unstable or reset, the keys should be expected to be identical following a build unless there is a local change to source files.
follow-up considerations:
- How will this implementation affect running files?
- In the case of conflicts when applying the developer's delta, this delta should be forced. In case of conflicts when applying the build-tree delta this should produce a fatal error.
- How will files with the same content and different properties be handled?
- how to optimize configuration of workspaces (for elements that need to be prepared)? Currently, to immediately support non-incremental RE builds with workspaces we need to recalculate the workspace source key each time/restage the workspace into the cas (since some files may have changed) - duplicate imports should be optimized partly by casd. We also need to prepare the sandbox each time since the result of that step is not an input to the build (eg Makefiles may not be present in the workspace before triggering the build). This is the most correct way to handle it, since we don't know how changes to the open workspace would affect the result of any configuration. However, if we want to optimize it away then we will need to do something like the following:
- after preparing the sandbox, make sure this new state is cached, this means there will be a new key for the source which corresponds to the workspace after configuration
- set the workspace source key to this value and additionally save this in the workspace.yml. additionally we'll need to map the workspace key to the 'prepared' workspace key so that on recalculation we can associate the two to avoid re-configuring.
- when the workspace is reopened or rebuilt the mapping should be available to the element, the key is calculated by the import of the workspace to the cas and this is checked against the loaded key association. If the pre-configure key matches then the source key should be set to the saved post-configure key and that will be used to find the cached configured workspace.
Tasks
for plugins that advertise BST_VIRTUAL_DIRECTORY:
!1563 (merged))
MR1 (-
workspaces via sourcecache 1/2: -
append 'workspace' source during element collection and store the element sources. Forbid the workspace plugin manually. -
add minimal 'workspace' source plugin -
reset workspaces using the original sources held by the workspace source object -
remove additional handling for workspaces in functions responsible for loading/staging.
-
-
remove workspace cache clearing (#1088 (closed)) -
remove workspace handling in element cache key calculation (partial reversion of #1073 (closed))
followups:
!1640 (merged))
MR2 (-
workspaces via sourcecache 2/2: -
reset workspaces using the original sources held by the workspace source object -
re-initialising open workspaces should raise a SourceError (will require changes for workspace resetting) (#1140 (closed)) -
optimisations for show-cached (#1143 (closed)) (!1612 (merged))
-
!1653 (merged))
MR2.5 (-
remove special loading for workspace sources. now that workspace init_workspace raises an exception (!1640 (merged)), there is no need to retain the original element sources and much of that handling in loader.py and element.py can be removed
!1682 (merged))
MR3 (-
workspaces via RE: support remote execution for opened workspaces (non incremental) (#933 (closed)) -
Add test: a series of builds of the autotools plugin for amhello to the remote execution suite. This should assert that the following occurs: -
autotools amhello can be built remotely from a local workspace: after the build, the artifact cache should contain the artifacts including the generated files (such as the makefile). The makefile should be newer than the input file and the objects should be newer than the makefile. The modified buildtree should be imported into CAS and the local workspace should remain unchanged. -
the project, when rebuilt results in no additional work and no files are modified. The workspace and CAS should remain unchanged. -
touching a file results in a full rebuild
-
!1726 (closed))
MR4 (-
BuildGrid/buildbox/buildbox-common#32 -
BuildGrid/buildbox/buildbox-casd#40 (closed) -
#1216 (closed) (!1761 (merged)) -
#1247 (closed) -
changes for incremental workspace builds: -
on every workspace operation, stash the artefact ref and dependency hash, recover these on next workspace -
diff creation/application mechanism (!1769 (merged)) -
diff/apply workspace trees when preparing for build -
open workspaces respecting mtimes if sources were cached with these (this is free provided this is supported in Source.stage
and we copy rather than hardlink files)
-
-
Add test: a series of builds of the autotools plugin for amhello to the remote execution suite. This should assert that the following occurs: -
autotools amhello can be built remotely from a local workspace: after the build, the artifact cache should contain the artifacts including the generated files (such as the makefile). The makefile should be newer than the input file and the objects should be newer than the makefile. The modified buildtree should be imported into CAS and the local workspace should remain unchanged. -
the project, when rebuilt results in no additional work and no files are modified. The workspace and CAS should remain unchanged. -
touching a file results in an incremental rebuild and appropriately modified timestamps in CAS, the workspace should be unchanged apart from the touched file. The touched file should be correctly imported into CAS and reflected in the modified buildtree. -
a content change of a source file results in an incremental rebuild and appropriately modified timestamps. Apart from local changes, the workspace should remain unchanged, the changed source should be imported into CAS and reflected in the modified buildtree.
-
MR5
-
deprecate _workspace.py
: chunks of this module may be removed following previous MRs, some parts will be subject to implementation changes in previous MRs