Add summary of benefits using experimentation.rb to experiment guide
Dallas Reedy A little bit of background on experimentation.rb:
Flipper gives us a few options when it comes to roll-out strategies (they call them “gates”) for partially releasing a feature. We’re making heavy use of two of those strategies when it comes to running experiments:
Percentage of Time – every time (even within a single request) we ask Flipper if a feature is enabled or not it basically rolls a
🎲to decide (let’s say it has a 100-sided 🎲and we want a feature enabled 25% of the time, then any number it rolls that is between 1 & 25 will cause a “yes” answer, all other rolls cause a “no” answer). Percentage of Actors – the algorithm Flipper uses here is a bit more complicated than the simple random die roll approach, but suffice it to say that Flipper takes a unique identifier for the actor (called the flipper_id), mixes it with the name of the experiment, and uses a hashing algorithm to cast that to an integer value, then it just decides if that value is within the desired percentage range or not. The Percentage of Actors approach is the ideal approach for using Flipper as an A/B split-testing engine, but we have one specific caveat which caused us to create the experimentation.rb file: what do we do when the actor we want to experiment on is unknown to the system? In other words, before a visitor becomes a user (either by signing in or signing up), how can we reliably show that person a cohesive & stable user experience?
The experimentation.rb file solves that specific case by making use of the Percentage of Time percentage value (e.g. 25%) and a manually created cookie value we call experimentation_subject_id (which is itself just a random number), and then we use those values to create our own pseudo-Percentage of Actors approach. The main downside of this approach is that cookies are not stable values. If the same person uses a different browser or a different device, or if they clear their browser cookies, they will likely end up in a different group (control or experimental).
In other words, when we use experimentation.rb for running our experiments, we lose some amount of stickiness. The same person (the same actor) could easily end up seeing both the experimental version & the control version. I’m not sure there is an easy fix for this, but I think it’s good to know what Flipper gives us out of the box versus what experimentation.rb gives us, and, more importantly, what experimentation.rb does not give us.
Phil Calder Thanks for this
@dreedy, perhaps you could summarize these use cases in https://docs.gitlab.com/ee/development/experiment_guide/ (I suggest you focus on what we can do, what experimentation.rb solves/enables us to do, and how/why a developer would choose to use experimentation.rb).
From a Slack thread:
I’m guessing this relates to one of your comments from the Growth Engineering Weekly today.
To try to explain what’s going on (so you don’t have to go digging through the code, if you haven’t already), Flipper has a number of different “gates” that it uses to control how feature flags behave. Any one feature flag (by name) may have several active gates.
The most basic gate is the boolean gate (on or off). This is what gets created/updated when you provide true or false to this chatops command.
Another type of Flipper gate is the percentage-of-time gate. This is what gets created/updated when you provide a numerical value (without the addition of the
A third type of Flipper gate that this
chatopscommand knows about is the percentage-of-actors gate. If you provide it with a numerical value and the
--actorsflag, it will create/update the percentage-of-actors gate for the given feature flag.
You should check out the Flipper docs page on Gates for more information on how the gates work and in what order they take precedence.
You can also do
/chatops run feature --helpfor a few more details on what values & options it takes. Or you can run a non-existent sub-command for some example usage:
/chatops run feature foobar.
And if you wanna go spelunking in the code, this is a good place to start: https://gitlab.com/gitlab-com/chatops/-/blob/master/lib/chatops/commands/feature.rb
Something that is critical to understand for using our current
Experimentation Module) is that we only & exclusively check a feature flag’s
percentage_of_time_valuedirectly! We do not ask the higher-level question of “is this feature enabled?” and let Flipper decide based on all of the gates defined for it.
This means that we cannot simply use the fact that Flipper checks the boolean type gate first before moving onto all the other gate types to rollout a feature to all users all the time by doing
/chatops run feature set <some-experiment-flag> trueinstead of the current procedure of doing
/chatops run feature set <some-experiment-flag> 100.
Because of the way we’re handling DNT settings right now as well, it also means that even though an experiment feature flag is rolled out to 100% of users (via
/chatops run feature set <some-experiment-flag> 100) it is not necessarily presented to 100% of users. When a user has DNT set to not track them, we skip all experiment stuff altogether and never get to the point in our code which checks the percentage rollout for that feature flag. Those users always get the “control” experience.
See also team-tasks#210 (moved)