Container registry failures to read/write from CIFS share after upgrading to Gitlab 13.3.5
One thing that has started since we upgraded to Gitlab 13.3.5 is that all of our CI builds have been failing with errors like:
"received unexpected HTTP status: 500 Internal Server Error"
The registry service log mentions many entries like:
2020-09-10_02:49:10.76253 time="2020-09-10T02:49:10.762407371Z" level=error msg="response completed with error" auth.user.name=max err.code=unknown err.detail="map[data:filesystem: readdirent: interrupted system call]" err.message="unknown error" go.version=go1.14.7 http.request.host="gitlab.y.urbanlogiq.com:4567" http.request.id=affd0949-568c-420d-9841-f128b151e9ad http.request.method=GET http.request.remoteaddr=10.64.0.12 http.request.uri="/v2/_catalog?n=100" http.request.useragent=Spinnaker/6.5.0-20200121133956 http.response.contenttype="application/json; charset=utf-8" http.response.duration=458.370903ms http.response.status=500 http.response.written=139
I did some digging around and found a similar issue with a completely different service that exactly mirrors what we are seeing: https://forum.restic.net/t/prune-fails-on-cifs-repo-using-go-1-14-build/2579/9
For what it's worth, setting GODEBUG=asyncpreemptoff=1 (by creating a GODEBUG file in /opt/gitlab/etc/registry/env with the content asyncpreemptoff=1) resolved the issue
I think this request for support is to have the Gitlab registry better handle EINTR rather than erroring out, or setting GODEBUG=asyncpreemptoff=1 by default.