Disable project services that fail to connect after some time
There are project services in use that fail repeatedly over time, and it's highly like that they will never succeed. I suggest that we:
- Track consecutive failures
- Track last successful connection
- If the # of failures and connection time exceed a threshold, disable the service and notify the project owner.
This is one example of someone using a Drone CI service that no longer exists in DNS:
https://sentry.gitlap.com/gitlab/gitlabcom/issues/111116/
SocketError: getaddrinfo: Name or service not known
app/models/project_services/drone_ci_service.rb:52:in `calculate_reactive_cache'
response = HTTParty.get(commit_status_path(sha, ref), verify: enable_ssl_verification)
lib/gitlab/metrics/instrumentation.rb:159:in `block in calculate_reactive_cache'
.measure { super }
lib/gitlab/metrics/method_call.rb:36:in `measure'
retval = yield
lib/gitlab/metrics/instrumentation.rb:159:in `calculate_reactive_cache'
.measure { super }
app/models/concerns/reactive_caching.rb:84:in `block (3 levels) in exclusively_update_reactive_cache!'
new_value = calculate_reactive_cache(*args)
...
(75 additional frame(s) were not displayed)
SocketError: Failed to open TCP connection to x.y.z.cloudapp.azure.com:80 (getaddrinfo: Name or service not known)
app/models/project_services/drone_ci_service.rb:52:in `calculate_reactive_cache'
response = HTTParty.get(commit_status_path(sha, ref), verify: enable_ssl_verification)
lib/gitlab/metrics/instrumentation.rb:159:in `block in calculate_reactive_cache'
.measure { super }
lib/gitlab/metrics/method_call.rb:36:in `measure'
retval = yield
lib/gitlab/metrics/instrumentation.rb:159:in `calculate_reactive_cache'
.measure { super }
app/models/concerns/reactive_caching.rb:84:in `block (3 levels) in exclusively_update_reactive_cache!'
new_value = calculate_reactive_cache(*args)
...
(74 additional frame(s) were not displayed)
Failed to open TCP connection to x.y.z.cloudapp.azure.com:80 (getaddrinfo: Name or service not known)
Edited by Stan Hu