Research Spike: Use GitLab's Feature Flags for GitLab's developments
Description
In terms of Dogfooding, we should use GitLab's feature flag system in our development.
This is preliminary work for https://gitlab.com/gitlab-org/release/framework/issues/32.
Aspects
The aspects must be addressed for dogfooding on gitlab.com:
Feature
(Flipper
) is.
Aspect: The new architecture is reliable enough as well as current This aspect focuses on performance, resiliency and scalability of the new architecture.
Key points
-
unleash-ruby-client might need some improvement if we run it on gitlab.com scale. It creates a ruby
Thread
per puma thread, sidekiq process and spring, which is hammering Feature Flag Server. - We'd need an efficient cache mechanism that reduces performance issue. For example, Feature Flag Client should cache the flag data into redis in order to reduce network I/O between server and client sides. Flipper allows to use multi-level cache: L1 => Process memory cache L2 => Redis cache. If the cache is not found, the client library fetches the flag data from database.
- We need to clarify error handling and fallback policy. For example, if Feature Flag Server is not reachable, how do the flags behave?
- We need to make sure that flag states are reflected in a timely manner. For example, in unleash-ruby-client, you can set a polling interval, if the interval is 1 minute, the flag state won't be reflected for a minute.
Aspect: The new architecture is flexible enough to allow developers to achieve their tasks
This aspect focuses on capability of the new architecture.
Key Points
- Today, we have many different usages with
Feature
. For example,Gitlab::Experimentation
is built on top ofFeature
library and does a bunch of extra stuff that current GitLab Feature Flag cannot handle, e.g. event tracking. Ideally, we should have one interface toFeature
let developers achieve their mission, such as A/B testing, experiment and safe rollout. - We need to cover the most common usage - enabling a feature flag on a specific actor (e.g. project, group and user). You can read the guideline of Feature Flag For GitLab Developments in https://docs.gitlab.com/ee/development/feature_flags/development.html.
- On-premises instances and local development instances (GDK/GCK) might need Feature Flag Server, however, this should be optional as it's cumbersome to prepare it for each GitLab instance.
Flipper schema/API in GitLab's Feature Flags
Proposal: SupportToday, GitLab's feature flag only supports Unleash schema/API, but it can also support any schema/API such as Flipper API.
Benefits
- We just need a minimal change to
Feature
class (a client library for GitLab). All we need is just adding Flipper HTTP adapter. - It's performant. We can continue using L1/L2 caches of
Feature
class. Most of the times, flag data are returned from cache layers, so it barely requests to Feature Flag server. - We can expect more active users in GitLab's feature flag as Flipper is one of the most popular feature flag system in ruby community (2.2k stars). The new API can be used by any Flipper projects regardless of the scale (Ruby projects only, for the other language, unleash APIs can be used).
- This prevents us from building an unleash specific UI/UX, which allows us to look GitLab Feature Flag as a generic interface more than ever.
- It can easily work with extensions (YAML, experimentation.rb, etc).
Implementation/Usage details
- As an alternative approach of polling, we're going to pursue the approach to fetch each flag data every time when a flag is evaluated. (Something similar to #26842 (closed))
- We allow administorators to choose Database adapter or HTTP adapter to control flags in GitLab. The default is database adapter. If HTTP adapter is chosen, it connects to a feature flag server, which is an external instance of GitLab.
- We likely use HTTP adapter on dev.gitlab.org, staging.gitlab.com and gitlab.com instances only. The other instances (e.g. on-premises ones) likely keep using Database adapter.
- To switch adapter, we provide a rake task to migrate flag data from Database to HTTP, or vice-versa. We can use the import feature provided by Flipper
- Developers can change flags via Chatops or rails console (i.e. Feature.enable). You can see the strategy vs gate mapping below.
- Developers cannot change flags in GitLab's Feature Flag UI (i.e. readonly). This is because chatops has some extended features e.g. freeze all flags during a production incident. We'll likely follow this up.
- As always, Feature class is the central place to control feature flags in GitLab.
Feature Flag Servers
- dev.gitlab.org ... https://ops.gitlab.net/gitlab-org/feature_flags/dev
- staging.gitlab.com ... https://ops.gitlab.net/gitlab-org/feature_flags/staging
- gitlab.com ... https://ops.gitlab.net/gitlab-org/feature_flags/production
Adapter hierarchy
- L1: Process Memory
- L2: Redis
- Persistent storage: Choose from 1) Active Record adapter (current) 2) HTTP adapter (GitLab's Feature Flag)
Schema Mapping
GitLab UI | Unleash's strategy | Flipper's gate | Example |
---|---|---|---|
All users | default |
boolean |
true |
User IDs | userWithId |
actors |
Project:123 |
Percent rollout | gradualRolloutUserId |
percentage_of_actors |
50 |
Flipper
GET /features
features:
- key:
state:
gates:
- key: (boolean/groups/actors/percentage_of_actors/percentage_of_time)
name:
value:
Unleash
GET features
version:
features:
- name:
description:
enabled:
strategies:
- name:
parameters:
GitLab v4 public APIs
# Flipper
GET /api/:version/feature_flags/flipper/:project_id/features(.:format) -
GET /api/:version/feature_flags/flipper/:project_id/features/:feature_name(.:format) -
# Unleash
GET /api/:version/feature_flags/unleash/:project_id/features(.:format) - Get a list of features (deprecated, v2 client support)
GET /api/:version/feature_flags/unleash/:project_id/client/features(.:format) - Get a list of features
Roadmap
- Support Flipper API in GitLab Feature Flag
- Allow to configure HTTP adapter with gitlab.yml
- Support
percentage_of_time
gate => #36380 (closed) - Documentation for new architecture
- Configure HTTP apdater on dev.gitlab.org
-
- Announce when we reconfigure the instance and developers cannot update feature flags during maintaince.
-
- Update dev.gitlab.org to use HTTP adapter which points to a project in ops.gitlab.net.
-
- Migrate data from ActiveRecordAdapter to HTTPAdapter
-
- Announce that migration is done. Developers can update feature flags via chatops again.
-
- Configure HTTP apdater on stanging.gitlab.com
-
- Announce when we reconfigure the instance and developers cannot update feature flags during maintaince.
-
- Update dev.gitlab.org to use HTTP adapter which points to a project in ops.gitlab.net.
-
- Migrate data from ActiveRecordAdapter to HTTPAdapter
-
- Announce that migration is done. Developers can update feature flags via chatops again.
-
- Evaluate the change on dev.gitlab.org and stanging.gitlab.com for a month.
- Performance monitoring.
- Fixing issues.
- Configure HTTP apdater on gitlab.com
-
- Announce when we reconfigure the instance and developers cannot update feature flags during maintaince.
-
- Update dev.gitlab.org to use HTTP adapter which points to a project in ops.gitlab.net.
-
- Migrate data from ActiveRecordAdapter to HTTPAdapter
-
- Announce that migration is done. Developers can update feature flags via chatops again.
-
Follow-up
- Allow develoers to update Feature Flags in UI.
- Webhook to invalidate cache
- Make Flipper API support GA
TODO
-
Finish PoC for the latest proposal => !34152 (closed) -
Break down the issue and weight it.
old proposals
## Proposal: Support Flipper schema/API in GitLab's Feature FlagsToday, GitLab's feature flag only supports Unleash schema/API, but it can also support any schema/API such as Flipper API.
This brings some benefits that:
- We don't need significant change in
Feature
class (FF-client library for GitLab). What needs to be changed is basically Flipper HTTP adapter inFeature
class. L1/L2 caches and database as persistent store work as-is. Flag is fetched in the following order 1. Process Memory 2. Redis 3. GitLab's FF-server 4. Database. Mostly flag data are returned from redis, so it barely hits HTTP adapter. - We can expand user base of GitLab's feature flag as Flipper is very popular project in ruby community (7.1k stars). The new API can be used by any Flipper projects regardless of the scale (Ruby projects only, for the other language we still can provide unleash APIs).
- This would be a great opportunity to generalize our unleash specific implementation.
- It can easily work with extensions (YAML, experimentation.rb, etc).
- PoC: !33683 (closed)
Adapter hierarchy
- L1: Process Memory
- L2: Redis
- Persistent storage: Choose from 1) Active Record adapter (current) 2) HTTP adapter (GitLab's Feature Flag)
Key points
- If nothing is configured,
Feature
class behaves exactly the same with the current behavior. - If http adapter is configured, you can optionally control the flag state with GitLab Feature Flag.
- If a flag data is persisted in both database and http adapater, http adapter takes precedence over AR adapter.
- No migration required from AR adapter to HTTP adapter. We can gradually switch the feature flag workflow to GitLab feature flag.
- No immediate workflow change required in daily developments. Engineers still can use
Feature
class as they do today. - Chatops (
Feature
) updates flag states in database. HTTP adapter is read-only mode and doesn't accept CRUD via chatops yet. - On-premises/local development instance still uses database as persistent store. We don't need documentation change for alpha features that hidden by a flag.
- We need to keep maintaining schema mapping between unleash and flipper.
- If there is an unsupported gate in GitLab Feature Flag (e.g.
percentage_of_time
), engineers should use Flipper with ActiveRecord adapter instead GitLab Feature Flag (HTTP adapter). - TODO: Cache isn't invalided in
Feature
if engineers updated a flag state in GitLab Feature Flag. GitLab Feature Flag must request toFeature
to invalidate the cache on a specific feature key. (Maybe we can create a new endpoint inAPI::Features
?)
Cache invalidation
Feature Flag Server:
- GET api/v4/flipper/features
Workhorse:
CacheInvalidation:
- Polling flag data from Feature Flag Server (every 15sec)
- If a feature state is changed, it request rails to invalidate the L1/L2 cache.
- New endpoints: `api/v4/features/clear_cache` or `api/v4/features/clear_cache/:name`
GitLab-Rails
- L1 Process Memory
- L2 Redis
- HTTP adapter (which connects to GitLab Feature Flag)
- Active Record
Classes
- Flipper::Adapters::MultiPersistentLayer ... The adapter to combine multiple persistent layers and return flag data from the highest precedence.
- Flipper::Adapters::ReadonlyHttp ... Readonly adapter for HTTP
GitLab v4 public APIs
# Flipper
GET /api/:version/feature_flags/flipper/:project_id/features(.:format) -
GET /api/:version/feature_flags/flipper/:project_id/features/:feature_name(.:format) -
# Unleash
GET /api/:version/feature_flags/unleash/:project_id/features(.:format) - Get a list of features (deprecated, v2 client support)
GET /api/:version/feature_flags/unleash/:project_id/client/features(.:format) - Get a list of features
Schema difference
Flipper
GET /features
features:
- key:
state:
gates:
- key: (boolean/groups/actors/percentage_of_actors/percentage_of_time)
name:
value:
GET /features/{feature_name}
key:
state:
gates:
- key:
name:
value:
Unleash
GET features
version:
features:
- name:
description:
enabled:
strategies:
- name:
parameters:
Proposal: Introduce a proxy server as part of GitLab services
- Single service to poll flag data
- Server-side flag state computation (e.g. gates/strategy)
- Reduce HTTP traffic between gitlab.com and FF server
- LaunchDarkly provides Relay Proxy
- Unleash Proxy
- This should be optional. Especially recommended for large scale architecture.
- Allows frontend only flag (i.e. javascript)
- Still we need a solution about how to create an unleash adapter in
Feature
sequenceDiagram
participant FF Server
participant Workhorse
participant FF Proxy
participant Rails
Note right of FF Proxy: Proxy returns cached results
Rails->>FF Proxy: GET /api/v4/operations/features/:id
activate FF Proxy
FF Proxy-->>Rails: Returns computed flag state (String)
deactivate FF Proxy
Note left of FF Proxy: Proxy periodically fetches all flag data from FF Server
FF Proxy->>FF Server: GET /api/v4/operations/features
activate FF Server
FF Server-->>FF Proxy: Returns all flag data (JSON)
Note left of FF Proxy: Proxy periodically reports the metrics of flag usage
FF Proxy->>FF Server: POST /api/v4/operations/metrics
activate FF Server
FF Server-->>FF Proxy: Returns all flag data (JSON)
Proposal: Generalize Feature Flag system and implement unleash adapter
- PoC: !33852 (closed)
This MR accomplishes the following things:
- Generalize the
Feature
class. Currently, it heavily depends onFlipper
implementation thus it's very hard to extend for the other FF tools, such as Unleash. We create adapters inFeatureFlag::Adapters
domain to make theFeature
class compatible with any adapters. The system chooses one of the adapters always and keeps using it until the configuration file is changed. - Unleash configuration is defined in gitlab.yml/gitlab.rb. This allows us to turn on Unleash-FF on any environments flexibly. You can start using it in local environment, even today.
- Initializing Unleash clients at the initializers/unleash.rb (for Unicorn) or puma.rb (for Puma). This is the same initialization approach written in official guideline. https://github.com/Unleash/unleash-client-ruby
- If unleash is disabled in gitlab.yml, Flipper is used always, like today.
- Logging unleash client's activity in
unleash.log
with the structured logging. - Per Project/Group/User gate is capable with
userWithId
strategy (Although, it requires https://gitlab.com/gitlab-org/gitlab-ee/issues/14017 to be fixed at first). If you want to enable a particular feature on a project, then define a param in the convention<The class of Active Record>:<The id of the row>
e.g.Project:123
- This still has a problem that it basically uses polling mechanizm (unleash-ruby-client), therefore it doesn't meet criteria from the performance aspect