Skip to content

Add ability to re-obtain cache instances from sentinels whenever a ConnectionError occurs

Neill Whillans requested to merge neill/sentinel_conn_error into master

Before raising this MR, consider whether the following are required, and complete if so:

  • Unit tests
  • Metrics
  • Documentation update(s)

If not required, please explain in brief why not.

Description

This request aims to add the ability to re-obtain cache instances (master and replicas) from a discovered list of sentinels whenever a connection error occurs when performing an update_action_result() or get_action_result().

Changes proposed in this merge request:

  • Move existing code that initially discovers list of sentinels, resolves master and replicas, to a function _obtain_cache_instances()
  • Whenever a connection error arises (ConnectionError or TimeoutError) call new function and re-attempt the client command that produced the error

Validation

Used Redis sentinel docker implementation that spins up a number of sentinels, along with master and replica instances (https://www.developers-notebook.com/development/using-redis-sentinel-with-docker-compose/). A local buildgrid server can then be spun up, using the following cache settings (replacing host with IP address of sentinel instance):

   caches:
      - !redis-action-cache &build-cache
        storage: *cas-storage
        cache-failed-actions: true
        allow-updates: true
        host: 172.26.0.3
        port: 26379
        sentinel-master-name: redismaster

Along with a bots instance (tox -e bot -- host-tools), a recc command can then be run. Along with testing the scenario where a missing master is replaced by one of the replicas, when the sentinels agree, the case where the initially connected sentinel was also disconnected. Both these scenarios where achieved by 'pausing' the appropriate docker instance in the Redis sentinel implementation.

Issues addressed

n/a

Edited by Neill Whillans

Merge request reports