Resolve "Calculate sha256 digest of artifact on PublishProvenanceService"
Background
Definitions
- Job Artifacts: Are one or more files generated by a CI job and saved as job artifacts as declared by the artifacts directive in the CI configuration.
- Artifacts Bundle: Is the internal storage mechanism GitLab uses to persist job artifacts in the backend. The bundle is a zip file containing all the individual job artifacts for a particular job.
What are "SLSA provenance statements"?
The grouppipeline security is working towards providing users with SLSA Level 3 Provenance Attestations. Quoting from the SLSA documentation, it states that attestations are:
It’s the verifiable information about software artifacts describing where, when, and how something was produced. For higher SLSA levels and more resilient integrity guarantees, provenance requirements are stricter and need a deeper, more technical understanding of the predicate. Describe how an artifact or set of artifacts was produced so that:
- Consumers of the provenance can verify that the artifact was built according to expectations.
- Others can rebuild the artifact, if desired.
As a simplified TL;DR, in the context of GitLab, a provenance statement is a JSON document that correlates the sha256 sum of an artifact with the build information. A worker then performs a digital signature, which is called a provenance attestation. This is a highly sought-after feature, particularly for our GitLab Ultimate customers.
Why is this change required
We are required to generate provenance attestations that correlate build information with the hash of a specific artifact. While our current implementation performs an attestation of the artifacts bundle. This has some disadvantages:
- The "artifacts bundle" is a mechanism that GitLab uses internally that has no meaning to our users. For example, they generally do not distribute this bundle to their users, but rather distribute the artifacts themselves.
- We do not currently store a correlation between the SHA-256 of the artifacts and an artifacts bundle, which means it would be impossible to achieve the desired architecture. Particularly requirements such as "The API is queried with the SHA-256 of the artifact and returns the Sigstore bundle if found".
References
- Issue: Calculate sha256 digest of artifact on PublishProvenanceService
- ADR MR: ADR 005: SLSA SHA-256 hashing location
- ADR: ADR 005: Perform sha256 calculation in PublishProvenanceService
How to set up and validate locally
Run the following in the rails console:
[4] pry(main)> build = Ci::Build.last
[...]
[4] pry(main)> pps = Ci::Slsa::PublishProvenanceService.new(build)
=> #<Ci::Slsa::PublishProvenanceService:0x0000000329fd19a8
@build=
[...]
[5] pry(main)> pps.execute
=> #<ServiceResponse:0x00000003297f4848 @http_status=:ok, @message="OK", @payload={}, @reason=nil, @status=:success>
Inspecting the log/application_json.log file, we can find the following:
{"severity":"INFO","time":"2025-08-25T03:46:58.738Z","class":"Ci::Slsa::PublishProvenanceService","message":"Performing attestation for artifact","hash":"3c5bba498d6f7a2cb4c195cf0873c8b68c9407f04dfa9acaad7fe4875e5e93f1","path":"test.txt"}
Which is correct:
> file = Ci::Build.last.job_artifacts.filter { |a| a.file_type == "archive" }[0].file.file
> entry = Zip::File.open(file).entries[0]
> Digest::SHA256.hexdigest(entry.get_input_stream.read)
3c5bba498d6f7a2cb4c195cf0873c8b68c9407f04dfa9acaad7fe4875e5e93f1
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #559267 (closed)