Skip to content

Enabling / disabling features does not propagate correctly to Geo secondaries

Summary

We use the flipper gem to provide feature gates in GitLab. This seems to store a cache of enabled features in Redis.

In a Geo setup, the primary and secondary do not share Redis state. This means that changes to a feature on the primary do not invalidate the cache on the secondary, leading to inconsistent behaviour.

Steps to reproduce

On the secondary:

# Looks at the database
Feature.enabled?(:gitaly_ref_exists) # false

On the primary:

# Looks at the database
Feature.enabled?(:gitaly_ref_exists) # false
Feature.enable(:gitaly_ref_exists) # true

On the secondary:

# Does not look at the database
Feature.enabled?(:gitaly_ref_exists) # false

Only by clearing the redis cache manually are changes to feature flags reported by the secondary..

Possible fixes

  • Disable the feature cache on the secondary
  • Set a very short expiry on the secondary
  • Send a cache invalidation event via the log cursor whenever features are changed on the primary

Note

This issue is referenced on the https://docs.gitlab.com/ee/administration/geo/disaster_recovery/background_verification.html. Please update the documentation when we fixed the issue.

/cc @jramsay @stanhu this one has surprised me twice in recent days. It could lead to some very unexpected outcomes on gprd.

Edited by Valery Sizov