Add ability to re-obtain cache instances from sentinels whenever a ConnectionError occurs (!820) · Merge requests · BuildGrid / buildgrid

Neill Whillans requested to merge neill/sentinel_conn_error into master Aug 23, 2022

Before raising this MR, consider whether the following are required, and complete if so:

Unit tests
Metrics
Documentation update(s)

If not required, please explain in brief why not.

Description

This request aims to add the ability to re-obtain cache instances (master and replicas) from a discovered list of sentinels whenever a connection error occurs when performing an update_action_result() or get_action_result().

Changes proposed in this merge request:

Move existing code that initially discovers list of sentinels, resolves master and replicas, to a function _obtain_cache_instances()
Whenever a connection error arises (ConnectionError or TimeoutError) call new function and re-attempt the client command that produced the error

Validation

Used Redis sentinel docker implementation that spins up a number of sentinels, along with master and replica instances (https://www.developers-notebook.com/development/using-redis-sentinel-with-docker-compose/). A local buildgrid server can then be spun up, using the following cache settings (replacing host with IP address of sentinel instance):

   caches:
      - !redis-action-cache &build-cache
        storage: *cas-storage
        cache-failed-actions: true
        allow-updates: true
        host: 172.26.0.3
        port: 26379
        sentinel-master-name: redismaster

Along with a bots instance (tox -e bot -- host-tools), a recc command can then be run. Along with testing the scenario where a missing master is replaced by one of the replicas, when the sentinels agree, the case where the initially connected sentinel was also disconnected. Both these scenarios where achieved by 'pausing' the appropriate docker instance in the Redis sentinel implementation.

Issues addressed

n/a

Edited Aug 23, 2022 by Neill Whillans

Add ability to re-obtain cache instances from sentinels whenever a ConnectionError occurs

Description

Changes proposed in this merge request:

Validation

Issues addressed

Merge request reports