Investigate: Intermittent failure on push to container registry with podman
Summary
I'm using Gitlab CI (on kubernetes) to build OCI containers which are then pushed to the gitlab project container registry.
Occasionally the push fails part way through on one of the blobs with a 404 error.
Requesting bearer token: invalid status code from registry 404 (Not Found)
This issue has also been seen by a number of other podman users, see:
- https://github.com/containers/podman/issues/17999
- https://github.com/containers/podman/discussions/16842
Typically retrying the build succeeds fine with no change in configuration or repository.
I've only seen this failure once myself in ~ 20 podman builds as I'm currently in the process of migrating from docker to podman.
I'm running self-hosted GitLab Enterprise Edition 14.10.5-ee with the current stable podman release. Others have reported it on gtlab-ce 15.9.1 and GitLab 15.6.1-ee.
Steps to reproduce
build container:
stage: build
image: quay.io/podman/stable
script:
- podman login -u gitlab-ci-token -p ${CI_JOB_TOKEN} ${CI_REGISTRY}
- podman pull ${CI_REGISTRY_IMAGE}:latest || true
- podman build ${BUILD_ARGS} --cache-from ${CI_REGISTRY_IMAGE} --tag ${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA} .
# refresh login and push image
- podman login -u gitlab-ci-token -p ${CI_JOB_TOKEN} ${CI_REGISTRY}
- podman push ${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA}
What is the current bug behavior?
$ podman login -u gitlab-ci-token -p ${CI_JOB_TOKEN} ${CI_REGISTRY}
Login Succeeded!
$ podman push ${CI_REGISTRY_IMAGE}${VARIANT}:${SHA}
Getting image source signatures
Copying blob sha256:219c6c2423f19da93c40808cd3f8d84d65059c0b4ae26746e66c5352b5a25282
Copying blob sha256:d8c4bbc6d59ab20be18c2aee16d5b769b4e404f73c28cb3d3950753f102eea88
Copying blob sha256:3af14c9a24c941c626553628cf1942dcd94d40729777f2fcfbcd3b8a3dfccdd6
Copying blob sha256:7766968e58766e7bc774c4ac66783404b195a15f6e4e11c5dc58a1cbae6cf320
Copying blob sha256:16af495390990b358cfa28e324a7bf43b7b23ed05c6d343731a37e387851d9f0
Copying blob sha256:8a2adc4b2731751ec162d07bbec48acb5ca0536f8790694ac3e02257c3b4ce86
Copying blob sha256:0d44fb193a65eb7debec56a7d4d768dcf75b1f4521ffcf9706dfe404635d79b2
Copying blob sha256:88325318780adac08bc72b3e7aabdf90d0f5a04d5b082177c41367c2afbc38d5
Copying blob sha256:81f0f7e1e560c54afbb09ab8420fc65c07a84faa85ad0b58cb2df48523890fb4
Copying blob sha256:634249e18c52cd4ab5c341e980285b53101f060db40764e5df25fbb3c08abf03
Error: trying to reuse blob sha256:16af495390990b358cfa28e324a7bf43b7b23ed05c6d343731a37e387851d9f0 at destination: Requesting bearer token: invalid status code from registry 404 (Not Found)
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: command terminated with exit code 1
What is the expected correct behavior?
On retry:
$ podman login -u gitlab-ci-token -p ${CI_JOB_TOKEN} ${CI_REGISTRY}
Login Succeeded!
$ podman push ${CI_REGISTRY_IMAGE}${VARIANT}:${SHA}
Getting image source signatures
Copying blob sha256:6052239faecf96ad31118f8029fa44ebc0a2a173a5c765c4db95e57858a93757
Copying blob sha256:3af14c9a24c941c626553628cf1942dcd94d40729777f2fcfbcd3b8a3dfccdd6
Copying blob sha256:7766968e58766e7bc774c4ac66783404b195a15f6e4e11c5dc58a1cbae6cf320
Copying blob sha256:16af495390990b358cfa28e324a7bf43b7b23ed05c6d343731a37e387851d9f0
Copying blob sha256:219c6c2423f19da93c40808cd3f8d84d65059c0b4ae26746e66c5352b5a25282
Copying blob sha256:8a2adc4b2731751ec162d07bbec48acb5ca0536f8790694ac3e02257c3b4ce86
Copying blob sha256:eb7ea7acabd6213b43e4e3c39d3ffa637d682bbc031c95ba71ff1890a13b36b9
Copying blob sha256:161cda1e2ed5a447eae967335b4a1c13de99f87d72b036134b8651ff6633f8e6
Copying blob sha256:fc93ada0e9e33b58c125ee9254497d20e78f5d4ab552052d8bf64bf9228bb519
Copying blob sha256:2a4f602e42e98cfe86a80f4b5ee3a55a4fdf6141dcc2394955f5ed89b35a9caa
Copying config sha256:47d0c4d2097a993e7d71bb94ddb16b669be08904544074583bec529aca23e70f
Writing manifest to image destination
Storing signatures
Relevant logs and/or screenshots
1.2.3.4 - gitlab-ci-token [05/Dec/2022:06:41:34 +0000] "GET /jwt/auth?account=gitlab-ci-token&scope=repository%3Agroup%2Fproject%2Fnamespace%3Apull%2Cpush&service=container_registry HTTP/1.1" 200 964 "" "Buildah/1.27.2" 1.25
1.2.3.4 - gitlab-ci-token [05/Dec/2022:06:41:34 +0000] "GET /jwt/auth?account=gitlab-ci-token&scope=repository%3Agroup%2Fproject%2Fnamespace%3Apull%2Cpush&service=container_registry HTTP/1.1" 404 9830 "" "Buildah/1.27.2" 5.78
1.2.3.4 - gitlab-ci-token [05/Dec/2022:06:41:34 +0000] "GET /jwt/auth?account=gitlab-ci-token&scope=repository%3Agroup%2Fproject%2Fnamespace%3Apull%2Cpush&service=container_registry HTTP/1.1" 200 965 "" "Buildah/1.27.2" 1.25
1.2.3.4 - gitlab-ci-token [05/Dec/2022:06:41:34 +0000] "GET /jwt/auth?account=gitlab-ci-token&scope=repository%3Agroup%2Fproject%2Fnamespace%3Apull%2Cpush&service=container_registry HTTP/1.1" 200 966 "" "Buildah/1.27.2" 1.25
1.2.3.4 - gitlab-ci-token [05/Dec/2022:06:41:34 +0000] "GET /jwt/auth?account=gitlab-ci-token&scope=repository%3Agroup%2Fproject%2Fnamespace%3Apull%2Cpush&service=container_registry HTTP/1.1" 200 966 "" "Buildah/1.27.2" 1.25
"exception.class":"ActiveRecord::RecordNotFound",
"exception.message":"Couldn't find ContainerRepository",
"exception.backtrace":[
"app/models/container_repository.rb:600:in `find_by_path!'",
"app/models/container_repository.rb:592:in `find_or_create_from_path'",
"app/services/auth/container_registry_authentication_service.rb:162:in `ensure_container_repository!'",
...
https://github.com/containers/podman/discussions/16842#discussioncomment-4403693
Possible fixes
That issue could be exacerbated by the 6 or so duplicate GET /jwt/auth requests buildah/podman is sending all at once? Just trying to reason why this popped up with the move to buildah.