Use GitLab feature flag feature

Investigate using https://gitlab.com/help/user/project/operations/feature_flags that was just released.

Initial configuration

  • gitlab.yml/gitlab.rb? ApplicationSetting? or Environment variables?
  • Unleash.app_name should be distinguished by dev.gitlab.org, staging.gitlab.com, canary.gitlab.com, gitlab.com
  • Which GitLab project should be set as config.url? A project in ops.gitlab.net, gitlab.com, others?

Flexibility - Gate vs Strategy

  • How can we support per-project/group/user gating with unleash? i.e. Feature.enabled?(:a_feature, project)
  • How can we support rollout strategy with project/group/user context?
  • Flipper has a concept Gate to leverage flags per condition, whereas Unleash has a concept Strategy to leverage flags per condition.
  • The statistics of the gate usage today. 41 flags without context. 45 flags with one of project_id, group_id or user_id.
  • Today, we encourage employees to use a gate with one of the parameters - project, group or user https://docs.gitlab.com/ee/development/feature_flags/development.html
  • Should we allow users to define any strategies in GitLab Feature Flag system
  • Should we have custom strategy? e.g. https://gitlab.com/snippets/1890628

Optimization

Fetching

  • ETag cache seems not working yet. With it, unleash-client can skip fetching thus we can reduce network I/O.
  • Unleash-client fetches flag values per 15sec by default (polling). => It's supported by default.

Reading

  • Flipper suports L1/L2 cache. Does Unleash-ruby-client need to support it?
  • A recent incident that FF contributed to performance degradation => production#928 (comment 187441674)
  • Each Unleash.is_enabled? walks thgough strategies and the computed values are not cached yet. i.e. If the same flag is read multiple times in a single thread, the computation happens everytime.
  • Should unleash-client cache computed flag into Gitlab::SafeRequestStore (per-request global ivar)?
  • Flipper automatically memoize requested flag statuses (flip.memoize = true, maybe per-request memoization)

Control/Chatops vs UI

  • GitLab as Unleash server doesn't accept public API support yet, so that employees cannot change flag value via chattops.
  • For the quick win, should we allow users to update strategies to any values? (via Public API) e.g. /chatops run feature set new_navigation_bar 25 --dev (See more https://docs.gitlab.com/ee/development/feature_flags/controls.html) => https://gitlab.com/gitlab-org/gitlab-ee/issues/9566
  • GitLab as Unleash server allows employees to change the flag via UI, instead. Screenshot_from_2019-08-29_14-37-14
  • This UI doesn't allow you to set gate parameters (See above)

Monitoring/Logging

  • Monitor unleash-client health. Create a new log file to monitor unleash-client's activity, system failure, etc. It should use structured logging, which can be viewed/indexed in ElasticSearch and Kibana.
  • Grafana https://dashboards.gitlab.net/d/000000126/grape-endpoints?orgId=1&var-action=Grape%23GET%20%2Fapi%2Ffeature_flags%2Funleash%2F:project_id%2Fclient%2Ffeatures&var-database=influxdb-01-inf-gprd
  • Exception tracking is Sentry, as always.

Resiliency/Fallback plan

  • If unleash-client pressurizes the production load and SRE judged we should turn if off immediately, how can we turn it off and fallback to the existing behavior? Feature.enabled?(:unleash_server_enabled) seems necessary.
  • When the polling thread of unleash-client died, how can we recover it without restarting the entire Rails fleet?

Evaluation plan

  • https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7666
  • Environments: dev.gitlab.org, staging.gitlab.com, canary.gitlab.com and gitlab.com

Documentation/Education

  • Update documentation to encourage employees to use Unleash-FF for their features https://docs.gitlab.com/ee/development/feature_flags/development.html
  • Update documentation about how to control Unleash-FF https://docs.gitlab.com/ee/development/feature_flags/controls.html
  • Update rubocop rule to ban Flipper.enabled?.
  • Provide helper methods for rspec i.e. stub_feature_flags

HA

  • As long as it's stick with GitLab-Rails, it's automatically HA.

Geo

  • If master and slaves looking to the same GitLab as a Unleash server, updated values are automatically synchronized in all nodes (because of polling).

On-prem/Omnibus GitLab

  • Where does it store FF values? In proudction DB?
  • Create a hidden project for the control panel?
  • Provide a console command (via gitlab-rails console) to control their flags e.g. https://docs.gitlab.com/ee/administration/job_traces.html#enabling-live-trace

Transition period

  • How do we handle existing Flipper-FF? Should we migrate?`

How the system checks a feature on/off with unleash

sequenceDiagram
    participant postgres
    participant unleash server
    participant unleash client
    participant global var
    participant local storage
    participant FeatureA
    loop Polling every 15 sec
      unleash client->>unleash server: Request flags api/v4/feature_flags/unleash/:id
      activate unleash server
      unleash server->>postgres: Retrieving flag data from DB
      activate postgres
      postgres-->>unleash server: Return flag data
      deactivate postgres
      unleash server-->>unleash client: Return flag data
      deactivate unleash server
      unleash client->>local storage: Write flag data as backup file
      unleash client->>global var: Write flag data in memory
    end
    Note left of FeatureA: FeatureA checkes if the flag is on
    FeatureA->>unleash client: Unelash.is_enabled?(:feature_a)
    activate unleash client
    unleash client->>global var: Read flag data from memory
    activate global var
    unleash client->>global var: Return flag data
    deactivate global var
    unleash client->>unleash client: Evaluate with strategies
    unleash client-->>FeatureA: Return ture/false
    deactivate unleash client

How the system checks a feature on/off with flipper

sequenceDiagram
  participant flipper 
  participant ThreadCache
  participant redis
  participant postgres
  participant FeatureA 
  Note left of FeatureA: FeatureA checks if the flag is on
  FeatureA->>flipper:Feature.enabled?(:feature_a)
  activate flipper
  flipper->>ThreadCache:Try to read flag data
  activate ThreadCache
  ThreadCache-->>flipper:Return flag data if exist
  deactivate ThreadCache
  opt if flag data is not found
  flipper->>redis:Try to read flag data
  activate redis
  redis-->>flipper:Return flag data if exist
  deactivate redis
  flipper->>ThreadCache: Cache flag data
  end
  opt if flag data is not found
  flipper->>postgres:Try to read flag data
  activate postgres
  postgres-->>flipper:Return flag data if exist
  deactivate postgres
  flipper->>ThreadCache: Cache flag data
  flipper->>redis: Cache flag data
  end
  flipper->>flipper: Evaluate with gate
  flipper-->>FeatureA: Return true/false
  deactivate flipper

Slack

f_feature_flag

/cc @rspeicher @yorickpeterse

Edited Apr 13, 2021 by Orit Golowinski
Assignee Loading
Time tracking Loading