Upload artifact fails but jobs succeeds
Summary
Occasionally, after one of our build jobs is finished, it will start uploading the artifacts, and silently fail: the job succeeds but the artifacts are not uploaded and no errors are reported.
- Jobs that depend on these artifacts then fail, which requires re-running the whole job again (as opposed to just re-running the upload)
Steps to reproduce
- I have managed to reproduce the behavior (the helper crashing but job succeeding)
- More details in this project https://gitlab.com/jpsamper/runner-helper-reproducer
- In production, we usually see it when there are a lot of jobs running/uploading artifacts at the same time
- We've seen it with as low as 10 jobs uploading 1.5GB zipped (4-5GB unzipped) concurrently
- By using a custom gitlab-runner-helper with a lot more print statements, we have found that the logs stop after invoking
r.client.Do
(i.e. if we add a print statement right before and right after, the one right after never appears)- Naturally, this is the behavior when something goes wrong, when the artifact is uploaded correctly, we see both print statements
Actual behavior
If I understand correctly, the function call linked above is invoking Do
from the net/http
package, and that call seems to be crashing the gitlab-runner-helper
with no additional error message/return code/etc.
Expected behavior
- If an artifact upload fails, the job fails or retries
- An informative error message too, ideally
Relevant logs and/or screenshots
job log
When everything works as expected:
Uploading artifacts...
foo: found 15933 matching files and directories
bar: found 436 matching files and directories
baz: found 1 matching files and directories
qux: found 2 matching files and directories
Uploading artifacts as "archive" to coordinator... ok id=1234567 responseStatus=201 Created token=*****
Job succeeded
And when it doesn't:
Uploading artifacts...
foo: found 15933 matching files and directories
bar: found 436 matching files and directories
baz: found 1 matching files and directories
qux: found 2 matching files and directories
Job succeeded
Environment description
- Gitlab Runner with
docker executorkubernetes executor - Latest docker version
- Default config.toml
Used GitLab Runner version
We're currently on gitlab-runner 13.3.0
but have been seeing this since at least 12.9.0
Possible fixes
- The
gitlab-runner-helper
does a sanity check that the artifacts have actually been uploaded, and if not tries again- This could make sense as part of the docker image so that if the gitlab-runner-helper exits unexpectedly, the sanity check can still run
Edited by Juan Pablo Samper