Disable project services that fail to connect after some time
There are project services in use that fail repeatedly over time, and it's highly like that they will never succeed. I suggest that we: * Track consecutive failures * Track last successful connection * If the # of failures and connection time exceed a threshold, disable the service and notify the project owner. This is one example of someone using a Drone CI service that no longer exists in DNS: https://sentry.gitlap.com/gitlab/gitlabcom/issues/111116/ ``` SocketError: getaddrinfo: Name or service not known app/models/project_services/drone_ci_service.rb:52:in `calculate_reactive_cache' response = HTTParty.get(commit_status_path(sha, ref), verify: enable_ssl_verification) lib/gitlab/metrics/instrumentation.rb:159:in `block in calculate_reactive_cache' .measure { super } lib/gitlab/metrics/method_call.rb:36:in `measure' retval = yield lib/gitlab/metrics/instrumentation.rb:159:in `calculate_reactive_cache' .measure { super } app/models/concerns/reactive_caching.rb:84:in `block (3 levels) in exclusively_update_reactive_cache!' new_value = calculate_reactive_cache(*args) ... (75 additional frame(s) were not displayed) SocketError: Failed to open TCP connection to x.y.z.cloudapp.azure.com:80 (getaddrinfo: Name or service not known) app/models/project_services/drone_ci_service.rb:52:in `calculate_reactive_cache' response = HTTParty.get(commit_status_path(sha, ref), verify: enable_ssl_verification) lib/gitlab/metrics/instrumentation.rb:159:in `block in calculate_reactive_cache' .measure { super } lib/gitlab/metrics/method_call.rb:36:in `measure' retval = yield lib/gitlab/metrics/instrumentation.rb:159:in `calculate_reactive_cache' .measure { super } app/models/concerns/reactive_caching.rb:84:in `block (3 levels) in exclusively_update_reactive_cache!' new_value = calculate_reactive_cache(*args) ... (74 additional frame(s) were not displayed) Failed to open TCP connection to x.y.z.cloudapp.azure.com:80 (getaddrinfo: Name or service not known) ```
issue