Handle all possible network errors in Gitlab::HTTP"
The following discussion from !73507 (merged) should be addressed:
-
@.luke started a discussion: (+3 comments) Hi,
@mkaeppler! You reviewed a very similar MR !67555 (merged) a few months ago. Initially that MR had this change in it !67555 (comment 643559930) however, I removed it later to focus on the main villain at the time. That thread has some discussion about the nature of the error and how it's closely related toErrno::EHOSTUNREACHwhich we currently handle as aGitlab::HTTP::HTTP_ERRORSrescueable error. Could you please review this MR? Thank you!
We currently maintain a hand-crafted list of HTTP related errors in https://gitlab.com/gitlab-org/gitlab/-/blob/5d7b5696955c2928d82adaf50ed73f53cb1939bf/lib/gitlab/http.rb#L16 that we recover from gracefully. We have been slowly extending this list over time as we found some errors were missing from that list.
How can we make this future-proof?
For instance, I was asking myself if there is a common parent error class to these that we could catch instead, or if there is a platform array that holds all network related error Errnos (these are integers). I looked into this a bit and it looks like the answer is: not really.
Errno:: error types are synthesized from platform specific constants, and they inherit from SystemCallError:
[13] pry(main)> Errno::ECONNREFUSED.ancestors => [Errno::ECONNREFUSED, SystemCallError, StandardError, Exception, ...
Catching SystemCallError would be too broad though. I was then thinking that maybe we can find all network related constants and map them to their error types. There is Errno.constants but it lists all possible error conditions (154 on my Linux.)
Also, the manpage for errno says:
The error numbers that correspond to each symbolic name vary across UNIX systems, and even across different architectures on Linux. Therefore, numeric values are not included as part of the list of error names below.
Since we support various Linuxes, we can't be sure this would lead to predictable results.
I therefore did a manual sanity check, and grep'ed through the supported errors on my Linux to see what we might be missing:
[8:05:45] work/gl-gck::master ✗ errno -l | egrep 'NET|ADDR|HOST|REMOTE|SOCK|PROTO' ENONET 64 Machine is not on the network EREMOTE 66 Object is remote EPROTO 71 Protocol error ENOTSOCK 88 Socket operation on non-socket EDESTADDRREQ 89 Destination address required EPROTOTYPE 91 Protocol wrong type for socket ENOPROTOOPT 92 Protocol not available EPROTONOSUPPORT 93 Protocol not supported ESOCKTNOSUPPORT 94 Socket type not supported EADDRINUSE 98 Address already in use EADDRNOTAVAIL 99 Cannot assign requested address ENETDOWN 100 Network is down ENETUNREACH 101 Network is unreachable ENETRESET 102 Network dropped connection on reset EHOSTDOWN 112 Host is down EHOSTUNREACH 113 No route to host EREMOTEIO 121 Remote I/O error
Which all strike me as plausible to occur during TCP socket I/O.