Docker push from CI returns 500 Internal Server Error and 422 Unprocessable Entity

Description of the problem

We have a CI job that uses a shared runner and pushes multi-platform Docker images to the Docker registry on gitlab.com. Recently, that job stopped working. First it reported a 500 Internal Server Error:

#11 DONE 175.0s
978 #17 exporting to image
979 #17 exporting layers
980 #17 exporting layers 37.7s done
981 #17 exporting manifest sha256:ed808c808e280203e22049770d07ead090f2391d3a94b484b55d95e651d5ad0e 0.0s done
982 #17 exporting config sha256:39f8c1312087ec361dd9e3fe0510af52b3e11bf1aa5279d5f4efdc47dbe87ed8
983 #17 exporting config sha256:39f8c1312087ec361dd9e3fe0510af52b3e11bf1aa5279d5f4efdc47dbe87ed8 0.0s done
984 #17 exporting manifest sha256:9a53dfb6d6f47a930c975d9bc117c75bb5c3bf23fe38594d8114077be6d75210 0.0s done
985 #17 exporting config sha256:859e2ea8a92f3aa9be93f1c1a7e344937527028403443ad318380d71024d878e 0.0s done
986 #17 exporting manifest list sha256:9ee6f1f5fc31fca7dd7127fb7122c2afd49bf08a29b7de7f1c618c174ab0c2b6 0.0s done
987 #17 pushing layers
988 #17 pushing layers 5.8s done
989 #17 pushing manifest for registry.gitlab.com/irtlab/lmr-emulator/ambe:latest
990 #17 pushing manifest for registry.gitlab.com/irtlab/lmr-emulator/ambe:latest 0.6s done
991 #17 ERROR: unexpected response: 500 Internal Server Error
992 ------
993  > exporting to image:
994 ------
995 failed to solve: rpc error: code = Unknown desc = unexpected response: 500 Internal Server Error
996 ERROR: Job failed: exit code 1

After that, I tried deleting the old version of the image from the registry and restarting the job. Now the registry reports 422 Unprocessable Entity:

#16 DONE 174.6s
990 #17 exporting to image
991 #17 exporting layers
992 #17 exporting layers 37.5s done
993 #17 exporting manifest sha256:c754d18bafecba4809e924e58424946ac19c3521de4af2377bfdbd8083593c31 0.0s done
994 #17 exporting config sha256:4f6405e7c93664b5a663415bd3c790dc5bcc9b5972b545d50809c2dc69e7c9df 0.0s done
995 #17 exporting manifest sha256:6aba31993fa64f578fc64cbda984d2d9837e0779452b23e146b341470d65cb0e 0.0s done
996 #17 exporting config sha256:97d48f43c1524497c3df7c6bad727c56b5d7567e5da876e119f1b7ca056938ef 0.0s done
997 #17 exporting manifest list sha256:577dcab07ff4e6bb7a9064799754ef294ef18972a43acdce6cf18e2265ce895f 0.0s done
998 #17 pushing layers
999 #17 pushing layers 0.3s done
1000 #17 ERROR: failed to fetch oauth token: unexpected status: 422 Unprocessable Entity
1001 ------
1002  > exporting to image:
1003 ------
1004 failed to solve: rpc error: code = Unknown desc = failed to fetch oauth token: unexpected status: 422 Unprocessable Entity
1008 ERROR: Job failed: exit code 1

Any idea what could be wrong? We have other jobs within the same CI pipeline that also push multi-platform Docker images and those appear to be working fine.

I should also add that our project currently shows the "less than 30% CI minutes available" warning. However, usage quota at: https://gitlab.com/groups/irtlab/-/usage_quotas#pipelines-quota-tab shows that we still have about 25% of minutes left.

Update 20 minutes later: I retried the job one more time and it succeeded: https://gitlab.com/irtlab/lmr-emulator/-/jobs/379535294

Maybe it is an intermittent issue?

Which Group/Project (with full path) is experiencing the issue?

https://gitlab.com/irtlab/lmr-emulator

The job "build-ambe" in the repository's .gitlab-ci.yml fails with the above errors.

Approximate date/time when the error occurred.

The last time we managed to run the CI pipeline successfully was on December 10, 2019. It has been a problem since then.

Describe what you were doing right before the issue occurred.

Nothing special. We just pushed more commits to the git repository and let the CI pipeline rebuild all images.

Edited by Jan Janak