Experimentation is confusing and there are disparate systems and approaches for how analytics are generated.

Current Experimentation Frameworks:

Experimentation Module
GLEX (a.k.a. gitlab-experiment)

We introduced and have largely migrated to using GLEX several months back (about a year ago), but there are still unresolved experiments using Experimentation Module. Aside from this speaking to our struggles in prioritizing experiment clean up, it also leads to confusion when others have decided to try to implement experiments.

There are back end concerns, spec helpers, as well as front end methods that are in the same files but are intended to be used with GLEX or Experimentation Module. To resolve this and simplify the documentation and code, we need to work through the following:

Remove front end Experimentation Module code
Identify which experiments remain that use Experimentation Module
Prioritize the clean up of those remaining experiments using Experimentation Module
Remove back end Experimentation Module code

Current Reporting Strategies:

Experiment subjects table

Experimentation Module, as well as code added to ApplicationExperiment (GLEX), will attempt to add records to the database when they're included as the experiment subject. This is inefficient, as it blocks other database reads and writes for purely reporting capabilities.

This was only utilized because of constraints around tracking ids in snowplow events, and there was a known performance and reporting cost associated with this choice.

Context keys

GLEX generates an anonymous context key unique to every experiment and context for that experiment. This allows for a very narrow inspection of an experiment, but also requires adding custom instrumentation and connecting that instrumentation to the relevant experiment(s).

This is still a useful reporting and analytics approach, but may not be needed anymore, and so there's some cleanup we could do if we stopped, or minimized using this.

Page view events

GLEX surfaces all experiments that have been run in the rendering of a given page in the initial page view snowplow event. This enables advanced reporting around the pseudonymized objects, and even understanding the ways in which experiments impact and influence each other under different scenarios -- excluding or including those cases where more than one experiment was in a specific variant or not.

Conclusion:

As you can see, one of the challenges we face in enabling the wider company to run experiments, is in how confusing it is with the number of approaches we’ve ended up with. I've not even found a good way in which we can document the various reasons to choose one approach over another, or how that has a fairly large impact downstream in reporting and analytics.

So now that we have a bit of a write up, we should collectively agree on which thing(s) we consolidate towards and get that prioritized -- potentially even over the holidays while there's some engineering time to resolve the more technical concerns.

Based on what we determine we want to do in reporting and analytics moving forwards, we should start to clean up and remove the unused approaches, especially those for which we take a performance hit over.

Edited Feb 28, 2023 by Doug Stull