Skip to content

Research Spike: Use GitLab's Feature Flags for GitLab's developments

Description

In terms of Dogfooding, we should use GitLab's feature flag system in our development.

This is preliminary work for https://gitlab.com/gitlab-org/release/framework/issues/32.

Aspects

The aspects must be addressed for dogfooding on gitlab.com:

Aspect: The new architecture is reliable enough as well as current Feature (Flipper) is.

This aspect focuses on performance, resiliency and scalability of the new architecture.

Key points

  • unleash-ruby-client might need some improvement if we run it on gitlab.com scale. It creates a ruby Thread per puma thread, sidekiq process and spring, which is hammering Feature Flag Server.
  • We'd need an efficient cache mechanism that reduces performance issue. For example, Feature Flag Client should cache the flag data into redis in order to reduce network I/O between server and client sides. Flipper allows to use multi-level cache: L1 => Process memory cache L2 => Redis cache. If the cache is not found, the client library fetches the flag data from database.
  • We need to clarify error handling and fallback policy. For example, if Feature Flag Server is not reachable, how do the flags behave?
  • We need to make sure that flag states are reflected in a timely manner. For example, in unleash-ruby-client, you can set a polling interval, if the interval is 1 minute, the flag state won't be reflected for a minute.

Aspect: The new architecture is flexible enough to allow developers to achieve their tasks

This aspect focuses on capability of the new architecture.

Key Points

  • Today, we have many different usages with Feature. For example, Gitlab::Experimentation is built on top of Feature library and does a bunch of extra stuff that current GitLab Feature Flag cannot handle, e.g. event tracking. Ideally, we should have one interface to Feature let developers achieve their mission, such as A/B testing, experiment and safe rollout.
  • We need to cover the most common usage - enabling a feature flag on a specific actor (e.g. project, group and user). You can read the guideline of Feature Flag For GitLab Developments in https://docs.gitlab.com/ee/development/feature_flags/development.html.
  • On-premises instances and local development instances (GDK/GCK) might need Feature Flag Server, however, this should be optional as it's cumbersome to prepare it for each GitLab instance.

Proposal: Support Flipper schema/API in GitLab's Feature Flags

Today, GitLab's feature flag only supports Unleash schema/API, but it can also support any schema/API such as Flipper API.

Benefits

  • We just need a minimal change to Feature class (a client library for GitLab). All we need is just adding Flipper HTTP adapter.
  • It's performant. We can continue using L1/L2 caches of Feature class. Most of the times, flag data are returned from cache layers, so it barely requests to Feature Flag server.
  • We can expect more active users in GitLab's feature flag as Flipper is one of the most popular feature flag system in ruby community (2.2k stars). The new API can be used by any Flipper projects regardless of the scale (Ruby projects only, for the other language, unleash APIs can be used).
  • This prevents us from building an unleash specific UI/UX, which allows us to look GitLab Feature Flag as a generic interface more than ever.
  • It can easily work with extensions (YAML, experimentation.rb, etc).

Implementation/Usage details

  • As an alternative approach of polling, we're going to pursue the approach to fetch each flag data every time when a flag is evaluated. (Something similar to #26842 (closed))
  • We allow administorators to choose Database adapter or HTTP adapter to control flags in GitLab. The default is database adapter. If HTTP adapter is chosen, it connects to a feature flag server, which is an external instance of GitLab.
  • We likely use HTTP adapter on dev.gitlab.org, staging.gitlab.com and gitlab.com instances only. The other instances (e.g. on-premises ones) likely keep using Database adapter.
  • To switch adapter, we provide a rake task to migrate flag data from Database to HTTP, or vice-versa. We can use the import feature provided by Flipper
  • Developers can change flags via Chatops or rails console (i.e. Feature.enable). You can see the strategy vs gate mapping below.
  • Developers cannot change flags in GitLab's Feature Flag UI (i.e. readonly). This is because chatops has some extended features e.g. freeze all flags during a production incident. We'll likely follow this up.
  • As always, Feature class is the central place to control feature flags in GitLab.

Feature Flag Servers

Adapter hierarchy

  • L1: Process Memory
  • L2: Redis
  • Persistent storage: Choose from 1) Active Record adapter (current) 2) HTTP adapter (GitLab's Feature Flag)

Schema Mapping

GitLab UI Unleash's strategy Flipper's gate Example
All users default boolean true
User IDs userWithId actors Project:123
Percent rollout gradualRolloutUserId percentage_of_actors 50

Flipper

GET /features

features:
  - key:
    state: 
    gates:
      - key: (boolean/groups/actors/percentage_of_actors/percentage_of_time)
        name:
        value:

Unleash

GET features

version:
features:
  - name:
    description:
    enabled:
    strategies:
      - name:
        parameters:

GitLab v4 public APIs

# Flipper
GET /api/:version/feature_flags/flipper/:project_id/features(.:format) - 
GET /api/:version/feature_flags/flipper/:project_id/features/:feature_name(.:format) - 

# Unleash
GET /api/:version/feature_flags/unleash/:project_id/features(.:format) - Get a list of features (deprecated, v2 client support)
GET /api/:version/feature_flags/unleash/:project_id/client/features(.:format) - Get a list of features

Roadmap

  • Support Flipper API in GitLab Feature Flag
  • Allow to configure HTTP adapter with gitlab.yml
  • Support percentage_of_time gate => #36380 (closed)
  • Documentation for new architecture
  • Configure HTTP apdater on dev.gitlab.org
      1. Announce when we reconfigure the instance and developers cannot update feature flags during maintaince.
      1. Update dev.gitlab.org to use HTTP adapter which points to a project in ops.gitlab.net.
      1. Migrate data from ActiveRecordAdapter to HTTPAdapter
      1. Announce that migration is done. Developers can update feature flags via chatops again.
  • Configure HTTP apdater on stanging.gitlab.com
      1. Announce when we reconfigure the instance and developers cannot update feature flags during maintaince.
      1. Update dev.gitlab.org to use HTTP adapter which points to a project in ops.gitlab.net.
      1. Migrate data from ActiveRecordAdapter to HTTPAdapter
      1. Announce that migration is done. Developers can update feature flags via chatops again.
  • Evaluate the change on dev.gitlab.org and stanging.gitlab.com for a month.
    • Performance monitoring.
    • Fixing issues.
  • Configure HTTP apdater on gitlab.com
      1. Announce when we reconfigure the instance and developers cannot update feature flags during maintaince.
      1. Update dev.gitlab.org to use HTTP adapter which points to a project in ops.gitlab.net.
      1. Migrate data from ActiveRecordAdapter to HTTPAdapter
      1. Announce that migration is done. Developers can update feature flags via chatops again.

Follow-up

  • Allow develoers to update Feature Flags in UI.
    • Webhook to invalidate cache
  • Make Flipper API support GA

TODO

  • Finish PoC for the latest proposal => !34152 (closed)
  • Break down the issue and weight it.
old proposals ## Proposal: Support Flipper schema/API in GitLab's Feature Flags

Today, GitLab's feature flag only supports Unleash schema/API, but it can also support any schema/API such as Flipper API.

This brings some benefits that:

  • We don't need significant change in Feature class (FF-client library for GitLab). What needs to be changed is basically Flipper HTTP adapter in Feature class. L1/L2 caches and database as persistent store work as-is. Flag is fetched in the following order 1. Process Memory 2. Redis 3. GitLab's FF-server 4. Database. Mostly flag data are returned from redis, so it barely hits HTTP adapter.
  • We can expand user base of GitLab's feature flag as Flipper is very popular project in ruby community (7.1k stars). The new API can be used by any Flipper projects regardless of the scale (Ruby projects only, for the other language we still can provide unleash APIs).
  • This would be a great opportunity to generalize our unleash specific implementation.
  • It can easily work with extensions (YAML, experimentation.rb, etc).
  • PoC: !33683 (closed)

Adapter hierarchy

  • L1: Process Memory
  • L2: Redis
  • Persistent storage: Choose from 1) Active Record adapter (current) 2) HTTP adapter (GitLab's Feature Flag)

Key points

  • If nothing is configured, Feature class behaves exactly the same with the current behavior.
  • If http adapter is configured, you can optionally control the flag state with GitLab Feature Flag.
  • If a flag data is persisted in both database and http adapater, http adapter takes precedence over AR adapter.
  • No migration required from AR adapter to HTTP adapter. We can gradually switch the feature flag workflow to GitLab feature flag.
  • No immediate workflow change required in daily developments. Engineers still can use Feature class as they do today.
  • Chatops (Feature) updates flag states in database. HTTP adapter is read-only mode and doesn't accept CRUD via chatops yet.
  • On-premises/local development instance still uses database as persistent store. We don't need documentation change for alpha features that hidden by a flag.
  • We need to keep maintaining schema mapping between unleash and flipper.
  • If there is an unsupported gate in GitLab Feature Flag (e.g. percentage_of_time), engineers should use Flipper with ActiveRecord adapter instead GitLab Feature Flag (HTTP adapter).
  • TODO: Cache isn't invalided in Feature if engineers updated a flag state in GitLab Feature Flag. GitLab Feature Flag must request to Feature to invalidate the cache on a specific feature key. (Maybe we can create a new endpoint in API::Features?)

Cache invalidation

Feature Flag Server:
  - GET api/v4/flipper/features

Workhorse:
  CacheInvalidation:
    - Polling flag data from Feature Flag Server (every 15sec)
    - If a feature state is changed, it request rails to invalidate the L1/L2 cache.
    - New endpoints: `api/v4/features/clear_cache` or `api/v4/features/clear_cache/:name`

GitLab-Rails
  - L1 Process Memory
  - L2 Redis
  - HTTP adapter (which connects to GitLab Feature Flag)
  - Active Record

Classes

  • Flipper::Adapters::MultiPersistentLayer ... The adapter to combine multiple persistent layers and return flag data from the highest precedence.
  • Flipper::Adapters::ReadonlyHttp ... Readonly adapter for HTTP

GitLab v4 public APIs

# Flipper
GET /api/:version/feature_flags/flipper/:project_id/features(.:format) - 
GET /api/:version/feature_flags/flipper/:project_id/features/:feature_name(.:format) - 

# Unleash
GET /api/:version/feature_flags/unleash/:project_id/features(.:format) - Get a list of features (deprecated, v2 client support)
GET /api/:version/feature_flags/unleash/:project_id/client/features(.:format) - Get a list of features

Schema difference

Flipper

GET /features

features:
  - key:
    state: 
    gates:
      - key: (boolean/groups/actors/percentage_of_actors/percentage_of_time)
        name:
        value:

GET /features/{feature_name}

key:
state: 
gates:
  - key:
    name:
    value:

Unleash

GET features

version:
features:
  - name:
    description:
    enabled:
    strategies:
      - name:
        parameters:

Proposal: Introduce a proxy server as part of GitLab services

  • Single service to poll flag data
  • Server-side flag state computation (e.g. gates/strategy)
  • Reduce HTTP traffic between gitlab.com and FF server
  • LaunchDarkly provides Relay Proxy
  • Unleash Proxy
  • This should be optional. Especially recommended for large scale architecture.
  • Allows frontend only flag (i.e. javascript)
  • Still we need a solution about how to create an unleash adapter in Feature
sequenceDiagram
    participant FF Server
    participant Workhorse
	participant FF Proxy
    participant Rails
	Note right of FF Proxy: Proxy returns cached results
	Rails->>FF Proxy: GET /api/v4/operations/features/:id
	activate FF Proxy
	FF Proxy-->>Rails: Returns computed flag state (String)
	deactivate FF Proxy
	Note left of FF Proxy: Proxy periodically fetches all flag data from FF Server
	FF Proxy->>FF Server: GET /api/v4/operations/features
	activate FF Server
	FF Server-->>FF Proxy: Returns all flag data (JSON)
	Note left of FF Proxy: Proxy periodically reports the metrics of flag usage
	FF Proxy->>FF Server: POST /api/v4/operations/metrics
	activate FF Server
	FF Server-->>FF Proxy: Returns all flag data (JSON)

Proposal: Generalize Feature Flag system and implement unleash adapter

This MR accomplishes the following things:

  • Generalize the Feature class. Currently, it heavily depends on Flipper implementation thus it's very hard to extend for the other FF tools, such as Unleash. We create adapters in FeatureFlag::Adapters domain to make the Feature class compatible with any adapters. The system chooses one of the adapters always and keeps using it until the configuration file is changed.
  • Unleash configuration is defined in gitlab.yml/gitlab.rb. This allows us to turn on Unleash-FF on any environments flexibly. You can start using it in local environment, even today.
  • Initializing Unleash clients at the initializers/unleash.rb (for Unicorn) or puma.rb (for Puma). This is the same initialization approach written in official guideline. https://github.com/Unleash/unleash-client-ruby
  • If unleash is disabled in gitlab.yml, Flipper is used always, like today.
  • Logging unleash client's activity in unleash.log with the structured logging.
  • Per Project/Group/User gate is capable with userWithId strategy (Although, it requires https://gitlab.com/gitlab-org/gitlab-ee/issues/14017 to be fixed at first). If you want to enable a particular feature on a project, then define a param in the convention <The class of Active Record>:<The id of the row> e.g. Project:123
  • This still has a problem that it basically uses polling mechanizm (unleash-ruby-client), therefore it doesn't meet criteria from the performance aspect
Edited by Shinya Maeda