Pass all resolved sentinel instances to Sentinel in _resolve_master
Description
When constructing a redis.Sentinel
object, multiple host/port pairs can be passed which correspond to the locations of all known Sentinels. This allows the library to handle if the sentinel we're currently talking to goes down. This PR simplifies the logic of _resolve_master()
to take advantage of the built-in features of the redis.Sentinel
class by passing all host/port pairs directly to it, instead of only using one at a time.
The old version of the code appeared to re-order the list of sentinels based on what we last connected to, but this is also something the class does on it's own if we let it. So leveraging that allows for removal of all that code.
This should prevent errors like the ones I was seeing below:
2022-07-25 14:31:27,146:[buildgrid.server.actioncache.service][ERROR][gRPC_Executor_0]: Unexpected error in GetActionResult; request=[instance_name: "dev"
action_digest {
hash: "5fbdf4795187969461269b18044a52c26c880a4095a72e192360ad9b30cb6f78"
size_bytes: 275
}
inline_stdout: true
inline_stderr: true
inline_output_files: "test.o"
]
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/buildgrid/server/cas/storage/redis.py", line 41, in wrapper
return func(*args, **kwargs)
File "/usr/lib/python3.8/site-packages/buildgrid/server/actioncache/caches/redis_cache.py", line 105, in get_action_result
action_result = self._get_action_result(key, action_digest)
File "/usr/lib/python3.8/site-packages/buildgrid/server/actioncache/caches/redis_cache.py", line 159, in _get_action_result
value_in_cache = self._client_replica.get(key)
File "/usr/lib/python3.8/site-packages/redis/commands/core.py", line 1233, in get
return self.execute_command("GET", name)
File "/usr/lib/python3.8/site-packages/redis/client.py", line 1173, in execute_command
conn = self.connection or pool.get_connection(command_name, **options)
File "/usr/lib/python3.8/site-packages/redis/connection.py", line 1370, in get_connection
connection.connect()
File "/usr/lib/python3.8/site-packages/redis/sentinel.py", line 54, in connect
return self.retry.call_with_retry(
File "/usr/lib/python3.8/site-packages/redis/retry.py", line 50, in call_with_retry
raise error
File "/usr/lib/python3.8/site-packages/redis/retry.py", line 45, in call_with_retry
return do()
File "/usr/lib/python3.8/site-packages/redis/sentinel.py", line 46, in _connect_retry
for slave in self.connection_pool.rotate_slaves():
File "/usr/lib/python3.8/site-packages/redis/sentinel.py", line 141, in rotate_slaves
raise SlaveNotFoundError(f"No slave found for {self.service_name!r}")
redis.sentinel.SlaveNotFoundError: No slave found for 'buildgrid-test'
Edited by Jeremiah Bonney