Use GitLab feature flag feature

Investigate using https://gitlab.com/help/user/project/operations/feature_flags that was just released.

Initial configuration

gitlab.yml/gitlab.rb? ApplicationSetting? or Environment variables?
Unleash.app_name should be distinguished by dev.gitlab.org, staging.gitlab.com, canary.gitlab.com, gitlab.com
Which GitLab project should be set as config.url? A project in ops.gitlab.net, gitlab.com, others?

Flexibility - Gate vs Strategy

How can we support per-project/group/user gating with unleash? i.e. Feature.enabled?(:a_feature, project)
How can we support rollout strategy with project/group/user context?
Flipper has a concept Gate to leverage flags per condition, whereas Unleash has a concept Strategy to leverage flags per condition.
The statistics of the gate usage today. 41 flags without context. 45 flags with one of project_id, group_id or user_id.
Today, we encourage employees to use a gate with one of the parameters - project, group or user https://docs.gitlab.com/ee/development/feature_flags/development.html
Should we allow users to define any strategies in GitLab Feature Flag system
Should we have custom strategy? e.g. https://gitlab.com/snippets/1890628

Optimization

Fetching

ETag cache seems not working yet. With it, unleash-client can skip fetching thus we can reduce network I/O.
Unleash-client fetches flag values per 15sec by default (polling). => It's supported by default.

Reading

Flipper suports L1/L2 cache. Does Unleash-ruby-client need to support it?
A recent incident that FF contributed to performance degradation => production#928 (comment 187441674)
Each Unleash.is_enabled? walks thgough strategies and the computed values are not cached yet. i.e. If the same flag is read multiple times in a single thread, the computation happens everytime.
Should unleash-client cache computed flag into Gitlab::SafeRequestStore (per-request global ivar)?
Flipper automatically memoize requested flag statuses (flip.memoize = true, maybe per-request memoization)

Control/Chatops vs UI

GitLab as Unleash server doesn't accept public API support yet, so that employees cannot change flag value via chattops.
For the quick win, should we allow users to update strategies to any values? (via Public API) e.g. /chatops run feature set new_navigation_bar 25 --dev (See more https://docs.gitlab.com/ee/development/feature_flags/controls.html) => https://gitlab.com/gitlab-org/gitlab-ee/issues/9566
GitLab as Unleash server allows employees to change the flag via UI, instead.
This UI doesn't allow you to set gate parameters (See above)

Monitoring/Logging

Monitor unleash-client health. Create a new log file to monitor unleash-client's activity, system failure, etc. It should use structured logging, which can be viewed/indexed in ElasticSearch and Kibana.
Grafana https://dashboards.gitlab.net/d/000000126/grape-endpoints?orgId=1&var-action=Grape%23GET%20%2Fapi%2Ffeature_flags%2Funleash%2F:project_id%2Fclient%2Ffeatures&var-database=influxdb-01-inf-gprd
Exception tracking is Sentry, as always.

Resiliency/Fallback plan

If unleash-client pressurizes the production load and SRE judged we should turn if off immediately, how can we turn it off and fallback to the existing behavior? Feature.enabled?(:unleash_server_enabled) seems necessary.
When the polling thread of unleash-client died, how can we recover it without restarting the entire Rails fleet?

Evaluation plan

https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7666
Environments: dev.gitlab.org, staging.gitlab.com, canary.gitlab.com and gitlab.com

Documentation/Education

Update documentation to encourage employees to use Unleash-FF for their features https://docs.gitlab.com/ee/development/feature_flags/development.html
Update documentation about how to control Unleash-FF https://docs.gitlab.com/ee/development/feature_flags/controls.html
Update rubocop rule to ban Flipper.enabled?.
Provide helper methods for rspec i.e. stub_feature_flags

HA

As long as it's stick with GitLab-Rails, it's automatically HA.

Geo

If master and slaves looking to the same GitLab as a Unleash server, updated values are automatically synchronized in all nodes (because of polling).

On-prem/Omnibus GitLab

Where does it store FF values? In proudction DB?
Create a hidden project for the control panel?
Provide a console command (via gitlab-rails console) to control their flags e.g. https://docs.gitlab.com/ee/administration/job_traces.html#enabling-live-trace

Transition period

How do we handle existing Flipper-FF? Should we migrate?`

How the system checks a feature on/off with unleash

sequenceDiagram
    participant postgres
    participant unleash server
    participant unleash client
    participant global var
    participant local storage
    participant FeatureA
    loop Polling every 15 sec
      unleash client->>unleash server: Request flags api/v4/feature_flags/unleash/:id
      activate unleash server
      unleash server->>postgres: Retrieving flag data from DB
      activate postgres
      postgres-->>unleash server: Return flag data
      deactivate postgres
      unleash server-->>unleash client: Return flag data
      deactivate unleash server
      unleash client->>local storage: Write flag data as backup file
      unleash client->>global var: Write flag data in memory
    end
    Note left of FeatureA: FeatureA checkes if the flag is on
    FeatureA->>unleash client: Unelash.is_enabled?(:feature_a)
    activate unleash client
    unleash client->>global var: Read flag data from memory
    activate global var
    unleash client->>global var: Return flag data
    deactivate global var
    unleash client->>unleash client: Evaluate with strategies
    unleash client-->>FeatureA: Return ture/false
    deactivate unleash client

How the system checks a feature on/off with flipper

sequenceDiagram
  participant flipper 
  participant ThreadCache
  participant redis
  participant postgres
  participant FeatureA 
  Note left of FeatureA: FeatureA checks if the flag is on
  FeatureA->>flipper:Feature.enabled?(:feature_a)
  activate flipper
  flipper->>ThreadCache:Try to read flag data
  activate ThreadCache
  ThreadCache-->>flipper:Return flag data if exist
  deactivate ThreadCache
  opt if flag data is not found
  flipper->>redis:Try to read flag data
  activate redis
  redis-->>flipper:Return flag data if exist
  deactivate redis
  flipper->>ThreadCache: Cache flag data
  end
  opt if flag data is not found
  flipper->>postgres:Try to read flag data
  activate postgres
  postgres-->>flipper:Return flag data if exist
  deactivate postgres
  flipper->>ThreadCache: Cache flag data
  flipper->>redis: Cache flag data
  end
  flipper->>flipper: Evaluate with gate
  flipper-->>FeatureA: Return true/false
  deactivate flipper

Slack

f_feature_flag

/cc @rspeicher @yorickpeterse

Edited Apr 13, 2021 by Orit Golowinski