Handle 503 status when uploading artifacts and the object storage is unavailable
Background
In gitlab#196823 (closed) we observed 503 errors thrown by object storage, resulting in the Runner receiving 500 responses from Rails. Because the runner tries to upload 3 times in quick succession, all the upload requests might fail.
Proposal
We can catch the object storage errors on the Rails side and return the 503 status code. This way the Runner can distinguish it from a 500 and start backing off and perhaps increase the retry rate to 5 times.
Updates
- 2020-03-09: We've merged !1887 (merged) which implements the retry and backoff logic for 503 status code. We will deploy this later this week on the shared Runner fleet.
Edited by Steve Xuereb