Skip to content

Pass all resolved sentinel instances to Sentinel in _resolve_master

Jeremiah Bonney requested to merge jbonney/multiple-sentinals into master

Description

When constructing a redis.Sentinel object, multiple host/port pairs can be passed which correspond to the locations of all known Sentinels. This allows the library to handle if the sentinel we're currently talking to goes down. This PR simplifies the logic of _resolve_master() to take advantage of the built-in features of the redis.Sentinel class by passing all host/port pairs directly to it, instead of only using one at a time.

The old version of the code appeared to re-order the list of sentinels based on what we last connected to, but this is also something the class does on it's own if we let it. So leveraging that allows for removal of all that code.

This should prevent errors like the ones I was seeing below:

2022-07-25 14:31:27,146:[buildgrid.server.actioncache.service][ERROR][gRPC_Executor_0]: Unexpected error in GetActionResult; request=[instance_name: "dev"
action_digest {
  hash: "5fbdf4795187969461269b18044a52c26c880a4095a72e192360ad9b30cb6f78"
  size_bytes: 275
}
inline_stdout: true
inline_stderr: true
inline_output_files: "test.o"
]
Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/buildgrid/server/cas/storage/redis.py", line 41, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python3.8/site-packages/buildgrid/server/actioncache/caches/redis_cache.py", line 105, in get_action_result
    action_result = self._get_action_result(key, action_digest)
  File "/usr/lib/python3.8/site-packages/buildgrid/server/actioncache/caches/redis_cache.py", line 159, in _get_action_result
    value_in_cache = self._client_replica.get(key)
  File "/usr/lib/python3.8/site-packages/redis/commands/core.py", line 1233, in get
    return self.execute_command("GET", name)
  File "/usr/lib/python3.8/site-packages/redis/client.py", line 1173, in execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
  File "/usr/lib/python3.8/site-packages/redis/connection.py", line 1370, in get_connection
    connection.connect()
  File "/usr/lib/python3.8/site-packages/redis/sentinel.py", line 54, in connect
    return self.retry.call_with_retry(
  File "/usr/lib/python3.8/site-packages/redis/retry.py", line 50, in call_with_retry
    raise error
  File "/usr/lib/python3.8/site-packages/redis/retry.py", line 45, in call_with_retry
    return do()
  File "/usr/lib/python3.8/site-packages/redis/sentinel.py", line 46, in _connect_retry
    for slave in self.connection_pool.rotate_slaves():
  File "/usr/lib/python3.8/site-packages/redis/sentinel.py", line 141, in rotate_slaves
    raise SlaveNotFoundError(f"No slave found for {self.service_name!r}")
redis.sentinel.SlaveNotFoundError: No slave found for 'buildgrid-test'
Edited by Jeremiah Bonney

Merge request reports

Loading