Local source allows inclusion of random/data data

Summary

There is a general desire to encode project source (elements and refs for example) into the output, such that one can easily discover how the source code was built. In the absence of better support, it can lead people to try to hack things into their artifacts using local sources in an unreliable way, e.g.: Encoding the project.refs or elements/ directory into an artifact directly.

This is unreliable in popular CI scenarios where we want to build the latest of everything, and use bst build --track-all <target.bst> in CI to pickup the latest of everything maintained by a given working group and try to build it. These build scenarios are also where the likelihood of such hacks are higher, because it is easier to store a project.refs file than to example the master log file and view the tracking results of individual elements.

Note: this is very similar to #195 (closed) which was mostly fixed.

Why is this unreliable ?

What will happen when encoding a project.refs file into an artifact using a local source with build time tracking enabled is as follows:

  • BuildStream will startup and calculate the cache key components for all local sources early in the process
  • Next we will run the elements through the queues in the regular ways, causing selected elements to be tracked before building
  • These tracking results will result in modifications to the project.refs file (or element.bst files, depending on the project ref storage configuration)
  • At some random point, whenever an element which uses the said local source is built, the changed project.refs file will be added to an artifact
  • The resulting artifact contains data which does not match the cache key it was created for

What is the right way to get provenance information ?

The correct way to address this is need of encoded provenance information is to encode the data into the artifact outside of the payload, and to allow examination of this data through the bst artifact commands.

What is a reasonable workaround ?

If for example, you are running CI using gitlab, then it makes sense to copy over the resulting project.refs into the gitlab artifact, such that you can view it after a build.

In the same way that people likely store the log files, we can also store a modified project.refs (this is probably a good idea regardless of the presence of more fancy bst artifact commands).

Steps to reproduce

Build a project with tracking enabled, using an element like this:

kind: import

sources:
- kind: local
  path: project.refs

What is the current bug behavior?

BuildStream simply allows this behavior.

What is the expected correct behavior?

BuildStream should error out in the case that the project author tries to encode data which BuildStream knows can change as a result of a build.

Edited by Tristan Van Berkom