unexpected order of artifact downloads
if a job depends on multiple other jobs of previous jobs (e.g. implicitely on all jobs run, because no dependencies
were declared), the runner fetches the artifacts in an unexpected order.
notably, the order seems to be: the artifacts of the latest jobs are downloaded first.
this seems counter-intuitive: i would have expected jobs from earlier stages to be downloaded before jobs from later stages.
use-case
i discovered the issue while experimenting with codesigning in the CI.
the pipeline is like this:
stages:
- build
- test
- sign
- deploy
.atifacts:
artifacts:
name: ${CI_PROJECT_NAME}_${CI_COMMIT_REF_NAME}_${CI_JOB_NAME#*:}
paths:
- package/
build:Linux: # creates package/binary-linux
extends: .artifacts
stage: build
script:
- make install DESTDIR=$(pwd)/package
build:macOS: # creates package/binary-macOS
extends: build:Linux
tags:
- macOS
.test:
test:Linux:
stage: test
dependencies:
- build:Linux
script:
- ./package/binary-linux --selftest
test:macOS:
stage: test
dependencies:
- build:macOS
script:
- ./package/binary-macOS --selftest
sign:macOS: # modifies package/binary-macOS
extends: .artifacts
stage: sign
tags:
- macOS
dependencies:
- build:macOS
script:
- codesign package/binary-macOS
allow_failure: true
package: # creates a package containing both package/binary-linux and package/binary-macOS
stage: deploy
script:
- mk-multiOS-package package/
the important parts being:
- the build jobs all create artifacts in the same folder .
/package/
(the build system makes sure that the build products of the various jobs have different names, so they don't overwrite each other) - the final deploy stage creates a single package with all the binaries from the various build-jobs
this pipeline is extended by an optional sign
job.
the sign job will modify some of the binaries (by signing them).
expected behaviour
when all the jobs run successfully, i expect that the final package contains both package/binary-linux and the signed package/binary-macOS.
more generally: if two stages create artifacts of the same filename, i would expect that the artifacts from later stages overwrite artifacts from earlier stages
observed behaviour
to my dismay i discovered that the macOS binaries were not signed, despite the signing job having completed successfully.
on closer inspection, it turned out that the runner would first download and extract the artifact from the sign
job, and then download and extract the artifacts from the build
jobs, thus overwriting the newer files (from the later stage) with the older ones.
Proposal
Add a sort by stage_idx
when querying dependencies from previous stages: https://gitlab.com/gitlab-org/gitlab/-/blob/915bfa4f0d0e37db3506ac50c00e77b6d674d15e/app/models/ci/build_dependencies.rb#L132
Note: there is only one existing index that includes stage_idx
. We need to verify if this index will be used. Otherwise, we would need to create a new index, possibly an async index creation on ci_builds
.
CREATE INDEX index_ci_builds_on_commit_id_and_stage_idx_and_created_at ON ci_builds USING btree (commit_id, stage_idx, created_at);
Out of scope: ordering of DAG dependencies (those using needs
keyword). For now, we won't specifically address DAG then and may need a follow-up issue to solve for that case.
remarks
i'm not entirely sure how the order of downloading the artifacts relates to the "job order". my real pipeline contains 7 build jobs, 0 test jobs, 1 sign job and 1 deploy job. in the deploy job, the sign job gets downloaded first, and the build jobs get downloaded later, however not in the reverse-order of job creation. it might be that the download order reverses the order when the jobs finished; alternatively the order might just be undefined.
i totally understand that the download order for artifacts of jobs of the same stage is undefined (as is the order of execution and termination of those jobs).
however, the stages
allow me to enforce a given order of job-sets, and i think the artifact downloading must follow this order rather than ignore or invert it (so: later stages should overwrite earlier stages)
dependencies
?
Theoretically I could use the dependencies
keyword to have my package
job only pull in artifacts from build:linux
and sign:macOS
.
however, i don't like this at all. because:
- I can no-longer add build-jobs for other build-flavours without having to also update the
package
job.- right now, I only need to add another build-job, make suer it creates unique artifact files in the shared
artifacts/
directory (something my build system already takes care of) and everything is packaged automatically
- right now, I only need to add another build-job, make suer it creates unique artifact files in the shared
- the sign-job is no longer optional
- signing the binaries is not considered to be of the same importance as compiling the binaries or packaging them. using stages, we can make the job purely optional (if it (successfully) runs, some binaries in the package will be signed; otherwise those binaries won't be signed - fine).
environment
this is on a self-hosted GitLab-CE instance (omnibus-13.9.2)
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.