We should ask `WHY they should NOT use feature flags` instead of asking `WHY they should use them`

We had a lot of interesting conversations about Feature Flags in the: !49772 (merged).

I will copy my sentiment about the feature flag usage and why they are useful tool from my perspective and why we should rethink the way we think about feature flags. I here discuss a different aspects to change that perspective:

to ask people WHY they should NOT use feature flags: find a reason why feature flag CAN be skipped
INSTEAD asking people today WHY/WHEN they should use feature flag: find a reason why feature flag is NEEDED

This has very strong adjective to the thinking process of person making the decision, that does has implication to the quality and stability of the codebase and our GitLab.com deployments, or in broader sense our Velocity.

Comment 1: About perception of Feature Flags: !49772 (comment 465191856)

I think we wrongly assume that feature flag is a Configuration. Feature flag might be a something that resembles configuration, but it is not. Configuration is by definition something that is user-manageable. Feature flags are not. The target user of a feature flag is developer or SRE team. Feature flags are to ensure that we can safely rollout our work on our terms. If we use Feature Flags as Configuration, we are doing it wrong. If something needs to be configuration it should be configuration from the first moment. Regardless of that, in all cases we should aim for Convention over Configuration.

A majority of new code should be able to be developed without using feature flags

I kindly disagree with how it is written.

I think it puts an accent on a wrong objective: I don't think that someone should think in a terms if the following criteria are met. Rather, the person should think why their feature is so special that it does not need to have a feature flag. Majority of times there's not really a reason why it does not need feature flag. In the past these statements (even though documented) were severely underestimated and not fully understood by people reading them. GitLab as the whole suite is complex, the dependencies are hard to model, to the point that I would argue if we ever can 100% confidently say that something is safe to run. We usually don't know that. I was part of number of RCA's where feature was deemed to not require feature flag:

it was small

it was not affecting anything else

but it caused production incident

that made around 10-15 people sit on a call to hot-patch production for a few hours

that made people to work under time pressure to ensure that our it is not affecting our customers and our SLAs are met

Thus, I'm echoing @marin statement here !49772 (comment 465165572). Unless we have a strong testing suite, fast rollback, and other mechanisms in a place the perception that feature flags can be skipped will actually worsen the quality of software being released and cause significant harm to development.

I think we should aim to have a feature flags as friction-less as possible and make it a part of everyones daily routine. Someone may say that using feature flags reduces velocity or makes things harder to do. I can challenge that perception. I do develop a lot of features myself, and I know by myself that adding a minimal feature flag for the smallest even feature actually improves the work that I do, as I'm less concerned about the impact, because I know that I do control a time when the feature is released to public, and I know how to ensure that this works properly. From my experience I had a number of times that a simple toggle allowed to revert a change as some aspects were underestimated.

It is not noticeable that something was disabled/reverted quickly as this is a positive outcome of the usage. We do rather see the RCA as this has wide-spread negative impact. This negative impact is easily discoverable due to its severity. Maybe we have less of RCAs now, because we are more aware of the safe rollout procedures via feature flags? Maybe the outcome is that we don't see the positive impact of feature flags usage, rather see it as a potential toil.

Comment 2: About the Velocity and speed of shipping changes: !49772 (comment 468078241)

I'm not saying feature flags aren't needed... what I'm saying is that given the way things currently work and the amount of work individual engineers are performing on a regular basis the more feature flags, inevitably the longer it takes to actually get something delivered.

I think that we may have different understanding what it means Velocity. Velocity is not about shipping the individual change, but rather a set of changes that provide a working MVC of the feature. The iteration helps here actually ship right things, that results in a better understanding of the MVC. If we create a gate and force ourselves to do things sequentially, this will always be a problem. The FF actually allows to do it better: be able to ship many small things that do overlap. As you control when things are run, you can pick the right moment to validate the changes, and for example decide to toggle the FF for everyone or some users on GitLab.com. Getting things early is not equal to getting things unfinished and untested. They don't have to be polished, but they should be functional. FF allows you to validate that they are actually functional on a scale, something that is hard to get it right testing locally.

As for the recommendations, they are "should". It means that it depends on case-by-case. For example some complex feature like Merge Request Trains will take significantly longer to mature, but changing button color or adding an empty state can be as quick as a few hours. All of that is to be decided by developer. The high priority bug if it is really high will be even picked in a patch release.

As for the features, the items on GitLab.com land within a next day, and our customers can start using them. They might not be in a release post for a given milestone, but maybe this is sometimes even better, as once we enable it on GitLab.com we actually might "polish it a little more". Our customers running on-premise rarely like to be beta testers of features, and I heard a number of voices that many of them wait for .10 patch release to ensure that changes are ironed out from most annoying bugs. Of course, this is the cost of shipping many small changes, and doing that quickly. The FF allows us to figure a right moment for when the feature is ready, but if there's someone very interested it can even start using it today. Consider, that now if you find a high priority bug, we might fix it next day for GitLab.com, but we degradate experience for our on-premise customers as they are less willing to upgrade frequently. Now, this generates load on support team in some cases, and results in a bad reputation.

This is why Velocity is not about shipping unfinished changes. They have to be complete, tested, might not be fully polished, but has to be functional. And number of times I saw that developer did merge something buggy or simply working differently than was intended to be delivered. The FF allows us to validate that without causing a harm to the product.

Then my question should be, maybe for each feature that we ship, maybe PM should validate that this works as intended, described, designed and can actually be released in a release post? I'm now curious how often we do it. Given that we had a times where we published in a blog post features that were behind feature flag off by default, this makes me think that I'm kind of unsure...

Comment 3: About amount of feature flags: !49772 (comment 468063819)

Some would argue 100s is a problem today

I don't think that even a current number of 380 is a problem given that we never actually do an active cleanup of the feature flags. This translates to roughly 1-2 per engineer. I don't think it is a lot. When start to actively schedule removal of the outdated feature flags we can actually get to the 100.

Now, we also have super detailed stats about our feature flags usage, and some amount of work to cleanup the old ones: https://app.periscopedata.com/app/gitlab/792066/Engineering-::-Feature-Flags

Remarks

This issue is not about using feature flags always, and in all cases, but rather think in what cases it is acceptable to skip feature flags. In such cases do make on objectives. The cases like these could be:

maybe, completely new features that no one is yet exposed to, unless they click an option on a Sidebar

Edited Dec 17, 2020 by Kamil Trzciński