Resolve "Calculate sha256 digest of artifact on PublishProvenanceService"

Background

Definitions

  • Job Artifacts: Are one or more files generated by a CI job and saved as job artifacts as declared by the artifacts directive in the CI configuration.
  • Artifacts Bundle: Is the internal storage mechanism GitLab uses to persist job artifacts in the backend. The bundle is a zip file containing all the individual job artifacts for a particular job.

What are "SLSA provenance statements"?

The grouppipeline security is working towards providing users with SLSA Level 3 Provenance Attestations. Quoting from the SLSA documentation, it states that attestations are:

It’s the verifiable information about software artifacts describing where, when, and how something was produced. For higher SLSA levels and more resilient integrity guarantees, provenance requirements are stricter and need a deeper, more technical understanding of the predicate. Describe how an artifact or set of artifacts was produced so that:

  • Consumers of the provenance can verify that the artifact was built according to expectations.
  • Others can rebuild the artifact, if desired.

As a simplified TL;DR, in the context of GitLab, a provenance statement is a JSON document that correlates the sha256 sum of an artifact with the build information. A worker then performs a digital signature, which is called a provenance attestation. This is a highly sought-after feature, particularly for our GitLab Ultimate customers.

Why is this change required

We are required to generate provenance attestations that correlate build information with the hash of a specific artifact. While our current implementation performs an attestation of the artifacts bundle. This has some disadvantages:

  • The "artifacts bundle" is a mechanism that GitLab uses internally that has no meaning to our users. For example, they generally do not distribute this bundle to their users, but rather distribute the artifacts themselves.
  • We do not currently store a correlation between the SHA-256 of the artifacts and an artifacts bundle, which means it would be impossible to achieve the desired architecture. Particularly requirements such as "The API is queried with the SHA-256 of the artifact and returns the Sigstore bundle if found".

References

How to set up and validate locally

Run the following in the rails console:

[4] pry(main)> build = Ci::Build.last
[...]
[4] pry(main)> pps = Ci::Slsa::PublishProvenanceService.new(build)
=> #<Ci::Slsa::PublishProvenanceService:0x0000000329fd19a8
 @build=
[...]
[5] pry(main)> pps.execute
=> #<ServiceResponse:0x00000003297f4848 @http_status=:ok, @message="OK", @payload={}, @reason=nil, @status=:success>

Inspecting the log/application_json.log file, we can find the following:

{"severity":"INFO","time":"2025-08-25T03:46:58.738Z","class":"Ci::Slsa::PublishProvenanceService","message":"Performing attestation for artifact","hash":"3c5bba498d6f7a2cb4c195cf0873c8b68c9407f04dfa9acaad7fe4875e5e93f1","path":"test.txt"}

Which is correct:

> file = Ci::Build.last.job_artifacts.filter { |a| a.file_type == "archive" }[0].file.file
> entry = Zip::File.open(file).entries[0]
> Digest::SHA256.hexdigest(entry.get_input_stream.read)
3c5bba498d6f7a2cb4c195cf0873c8b68c9407f04dfa9acaad7fe4875e5e93f1

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #559267 (closed)

Edited by Sam Roque-Worcel

Merge request reports

Loading