Mark unreachable Gitaly nodes as unhealthy shards
On the Geo testbed, I was noticing not many jobs in the geo_project_sync queue. I noticed these messages were spewing in the Sidekiq logs:
2017-12-13_19:41:26.05866 sidekiq-besteffort-01 sidekiq-cluster: 2017-12-13T19:41:26.058Z 130872 TID-owqy4ten4 WARN: {"context":"Job raised exception","job":{"class":"Geo::ProjectSyncWorker","args":[2729211,"2017-12-13 19:41:14 +0000"],"retry":3,"queue":"geo_project_sync","dead":false,"jid":"6b8b9ca0a3c22a31637c41b8","created_at":1513194074.7552783,"enqueued_at":1513194074.762257,"error_message":"4:Deadline Exceeded","error_class":"GRPC::DeadlineExceeded","failed_at":1513194086.0467746,"retry_count":0},"jobstr":"{\"class\":\"Geo::ProjectSyncWorker\",\"args\":[2729211,\"2017-12-13 19:41:14 +0000\"],\"retry\":3,\"queue\":\"geo_project_sync\",\"dead\":false,\"jid\":\"6b8b9ca0a3c22a31637c41b8\",\"created_at\":1513194074.7552783,\"enqueued_at\":1513194074.762257}"}
2017-12-13_19:41:26.05872 sidekiq-besteffort-01 sidekiq-cluster: 2017-12-13T19:41:26.058Z 130872 TID-owqy4ten4 WARN: GRPC::DeadlineExceeded: 4:Deadline Exceeded
2017-12-13_19:41:26.05926 sidekiq-besteffort-01 sidekiq-cluster: 2017-12-13T19:41:26.059Z 130872 TID-owqy4ten4 WARN: /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/grpc-1.4.5-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:46:in `check_status'
2017-12-13_19:41:26.05935 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/grpc-1.4.5-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:186:in `attach_status_results_and_complete_call'
2017-12-13_19:41:26.05940 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/grpc-1.4.5-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:378:in `request_response'
2017-12-13_19:41:26.05943 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/grpc-1.4.5-x86_64-linux/src/ruby/lib/grpc/generic/client_stub.rb:167:in `request_response'
2017-12-13_19:41:26.05946 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/grpc-1.4.5-x86_64-linux/src/ruby/lib/grpc/generic/service.rb:185:in `block (3 levels) in rpc_stub_class'
2017-12-13_19:41:26.05949 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/gitaly_client.rb:127:in `call'
2017-12-13_19:41:26.05952 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/gitaly_client/repository_service.rb:13:in `exists?'
2017-12-13_19:41:26.05955 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/git/repository.rb:109:in `block in exists?'
2017-12-13_19:41:26.05960 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/gitaly_client.rb:232:in `block (2 levels) in migrate'
2017-12-13_19:41:26.05963 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/gitaly_client.rb:274:in `allow_n_plus_1_calls'
2017-12-13_19:41:26.05967 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/gitaly_client.rb:227:in `block in migrate'
2017-12-13_19:41:26.05970 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/influx_db.rb:99:in `measure'
2017-12-13_19:41:26.05972 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/gitaly_client.rb:225:in `migrate'
2017-12-13_19:41:26.05975 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/git/repository.rb:107:in `exists?'
2017-12-13_19:41:26.05978 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/service/gitlab-rails/app/models/repository.rb:503:in `exists?'
2017-12-13_19:41:26.05981 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/instrumentation.rb:159:in `block in _uncached_exists?'
2017-12-13_19:41:26.05984 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/method_call.rb:39:in `measure'
2017-12-13_19:41:26.05987 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/instrumentation.rb:159:in `_uncached_exists?'
2017-12-13_19:41:26.05990 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/service/gitlab-rails/app/models/repository.rb:76:in `block (2 levels) in cache_method'
2017-12-13_19:41:26.05995 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/activesupport-4.2.10/lib/active_support/cache.rb:299:in `block in fetch'
2017-12-13_19:41:26.05998 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/activesupport-4.2.10/lib/active_support/cache.rb:585:in `block in save_block_result_to_cache'
2017-12-13_19:41:26.06001 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/activesupport-4.2.10/lib/active_support/cache.rb:547:in `block in instrument'
2017-12-13_19:41:26.06007 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/activesupport-4.2.10/lib/active_support/notifications.rb:164:in `block in instrument'
2017-12-13_19:41:26.06014 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/activesupport-4.2.10/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
2017-12-13_19:41:26.06023 sidekiq-besteffort-01 sidekiq-cluster: /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/activesupport-4.2.10/lib/active_support/notifications.rb:164:in `instrument'
I'm not sure if this is just an issue with our NFS setup or Gitaly on the Geo testbed.
Edited by Stan Hu