Geo: fix container repository sync for OCI image indexes
What does this MR do?
Geo::ContainerRepositorySync silently failed to sync container
repository tags whose manifest is an OCI image index. Tag counts
looked fine but docker pull from the secondary returned
manifest unknown. Confirmed in a customer environment where 52 of
62 tags in one repository were being silently skipped on every sync
cycle.
Four coordinated fixes, each in its own commit:
- Widen
ACCEPTED_TYPES— the default Faraday connection'sAcceptheader now includesapplication/vnd.docker.distribution.manifest.list.v2+jsonandapplication/vnd.oci.image.index.v1+json, so HEAD requests for fat manifests return a real digest. Resolves the customer symptom. - Submanifest
mediaTypefallback — when the OCI image manifest body omitsmediaType(optional per the spec, often omitted by buildkit), fall back to the descriptor'smediaTypefrom the parent index, then toOCI_MANIFEST_V1_TYPE. Also tightens the existing OCI test to assert the specificContent-Typethat was previously matched withanything— the gap that hid this bug. - Read
secondary_tagsfrom the Docker V2 client — mirrorprimary_tags's code path so both sides resolve digests through the same client. Eliminates residual digest-format asymmetry that could trigger false MATCH / MISMATCH onprimary_tags - secondary_tags. Collapses the now-meaningless "GitLab API is supported / not supported" context splits in the spec. - Clean up orphan tags with unresolvable digests —
remove_tagfalls back to deleting by tag name when the digest is absent, using the OCI tag-delete endpoint (DELETE /v2/<path>/manifests/<tag>, Container Registry 16.4+). Closes the orphan-tag cleanup gap from #465580 (closed) that the Accept-header fix on its own only prevents going forward.
How to test
Run the affected specs
bundle exec rspec \
ee/spec/services/geo/container_repository_sync_spec.rb \
spec/lib/container_registry/client_spec.rb \
ee/spec/lib/container_registry/client_spec.rbAll 109 examples pass locally.
Reproduce the bug end-to-end on GDK with Geo
Requires a Geo primary + secondary running locally, container registry
enabled on both, and docker buildx available.
-
Build and push a multi-arch image to a project on the primary, so the resulting manifest is an OCI image index:
docker buildx create --use --name local-builder docker buildx build \ --platform linux/amd64,linux/arm64 \ -t <primary-host>:5005/<group>/<project>:latest \ --push \ - <<EOF FROM alpine:latest EOF -
Confirm the primary stores an OCI image index:
docker manifest inspect <primary-host>:5005/<group>/<project>:latest | jq -r .mediaType # application/vnd.oci.image.index.v1+json -
Trigger Geo container repository sync from the secondary's Rails console:
cr = ContainerRepository.find_by(path: '<group>/<project>') Geo::ContainerRepositorySync.new(cr).execute -
Before this MR:
docker pullfrom the secondary fails withmanifest unknown:docker pull <secondary-host>:5005/<group>/<project>:latest # manifest unknown: manifest unknown -
Check out this MR's branch, restart the Rails console, repeat step 3 once more, then repeat step 4. The pull now succeeds.
Confirm the fix on a real Geo deployment
If you have access to an affected Geo secondary, run snippets A–F
from the issue description's "Verification snippets" section before
and after deploying this MR. Snippet B (narrow vs wide Accept
HEAD) is the most direct signal: narrow_digest should go from
nil to a real sha256:....
The workaround script in the issue description does the same reconcile work this MR fixes, and gives an additional sanity check that the fix produces the expected end state.
References
- Closes #600486 (closed)
- Closes #465580 (closed)