Rollout strategies (!60) · Merge requests · GitLab.org / ruby / gems / GitLab Experiment

The concept of rollouts gives us different strategies for which variant to return when a context is included in the experiment group.

experiment(:example, foo: :bar) do |e|
  e.try(:variant1) { }
  e.try(:variant2) { }
  e.try(:variant3) { }
end

The above example would randomly select one of the variants if the context was included in the experiment group, because random is the default.

class ExampleExperiment < ApplicationExperiment
  default_rollout Gitlab::Experiment::Rollout::First
end

We can specify a rollout override if we have a custom experiment class.

experiment(:example, foo: :bar) do |e|
  e.rollout(Gitlab::Experiment::Rollout::RoundRobin)
  e.try(:variant1) { }
  e.try(:variant2) { }
  e.try(:variant3) { }
end

And we can also override the rollout any time we're running the experiment.

Previous thoughts / writeup

This is venturing down the path of rollout strategies. We're seeing more and more that multi-variants need this concept, and it seems like something we should start to think on and consider now that we've largely solved the feature flag (boolean state) concern.

I'm starting to consider renaming the variant resolver to be more of a "inclusion" resolver for now. Should we include this context in the experiment group? That starts to minimize the overall coupling with the feature flag logic like flipper or unleash (or in our case feature.rb.)

This kind of surfaces in our implementation of the resolve_variant_name in ApplicationExperiment.

  def resolve_variant_name
    return variant_names.first if Feature.enabled?(feature_flag_name, self, type: :experiment, default_enabled: :yaml)

    nil # Returning nil vs. :control is important for not caching and rollouts.
  end

What this is really saying is return the first variant, because we can't handle multiple variants by default. Right? This is really determining only inclusion and I thought it would be enough for the first experiments, but it appears I was wrong about that, and that the time might be upon us to consider a better path.

So basically we have our "First" rollout strategy as defined in our resolve_variant_name override. It's maybe the simplest, and limited strategy since it obviously is ineffective on a multi-variant experiment -- it just returns the first variant.

That's where Random and RoundRobin (thanks to @dstull for outlining) come into play then. I've provided them as examples, because we can see that in terms of complexity, there's definitely some (using a cache vs. not using a cache), but not much in terms of maintainability difference. If that's a useful perspective.

Anyway, I opened this MR to see what thoughts we can come up with as a team, and to see how we feel about this. Generally it makes me pretty happy to consider the options we might eventually be able to provide everyone, and how we can enable interfacing with them through chatops commands or through a graphical interface -- or even from our unleash feature flag interface eventually.

Edited Feb 10, 2021 by Jeremy Jackson

Rollout strategies

Merge request reports