Propose an A/B testing framework for testing conversion MVCs on about.gitlab.com so we can iterate, learn and increase conversions faster - and with more apples to apples data than comparing time periods.
Requirements
Collaborate with appropriate stakeholders on MVC A/B test solution
Proposed Tasks
Collaborate to identify potential requirements
Set MVC requirements
Collaborate on brainstorming technology solutions (consider build vs. buy)
Propose a solution
Follow-on Issue
Create an A/B testing framework for testing conversion MVCs on about.gitlab.com so we can iterate, learn and increase conversions faster - and with more apples to apples data than comparing time periods.
Co-design, develop and test solution on conversion MVC
Shane Bouchardchanged title from A/B Testing Framework for about.gitlab.com to Propose an A/B Testing Framework for about.gitlab.com
changed title from A/B Testing Framework for about.gitlab.com to Propose an A/B Testing Framework for about.gitlab.com
Shane Bouchardchanged title from Propose an A/B Testing Framework for about.gitlab.com to Propose an A/B Testing Framework for about.gitlab.com (spike)
changed title from Propose an A/B Testing Framework for about.gitlab.com to Propose an A/B Testing Framework for about.gitlab.com (spike)
Shane Bouchardchanged title from Propose an A/B Testing Framework for about.gitlab.com (spike) to Propose an A/B Testing Framework for about.gitlab.com (Spike)
changed title from Propose an A/B Testing Framework for about.gitlab.com (spike) to Propose an A/B Testing Framework for about.gitlab.com (Spike)
There's an Active tests section in CODEOWNERS for about.gitlab.com to notifiy the DMP team when changes are made to pages with active tests.
The handbook page above has a section listing active experiments (currently empty, might be unused. edit: verified via Matt, we aren't doing CRO right now.)
When it comes to the practice of running an A/B test. This is the experiment write up template we've developed and started using on the growth team. If possible I think it would be great for us all to use the same format so it's easier to collaborate.
Starting with SaaS means we don't need to invest in building a tool we may or may not use frequently.
Most testing SaaS contain heatmap tools therefore we wouldn't need to rely on HotJar and could run heatmap tests on larger audiences.
Feature flag or CDN based testing eliminates the problem with people updating a page when a test is running.
Using GitLab feature flags
Because we use a pre-compiled static site I don't believe this is an option, but I may be wrong. However, we should be able to use comparable tools like Optimizely's feature flagging
Using a CDN provider to create A/B tests
Pros:
Best option for end-users browsing our site (no flashes-of-content or clashes with Cookiebot).
Nearly bullet-proof reliability.
No extra SaaS costs.
Cons:
Tests will be difficult to create for non-developers
Difficult to find tutorials, comparisons, and documentation because of the company name.
Appears to lack feature flags.
Client-side javascript solutions can be fragile (see note 1).
Notes
The worst-case scenario for fragility would be inaccurate data or people being served a control treatment rather than a test. Many consider this an acceptable risk, ie the pros outweigh the cons.
The only tools I have personal experience with at the moment are Optimizely and Google Optimize.
We don't need to choose just one tool. We could try LaunchDarkly for some developer-driven tests in combination with a WYSIWYG SaaS for other tests where marketers need a WYSIWYG. Early adoption of LaunchDarkly may also facilitate the upcoming website refresh with outsourced vendors.
After we get a CMS, we may no longer need a WYSIWYG-based testing option. At the very least I'd recommend reevaluating at that time.
If we want to hit the ground running at no cost, I recommend Google Optimize.
For more feature-packed WYSIWYG-based options, I recommend trying out Convert.com.
For options without a WYSIWYG, I recommend trying LaunchDarkly. Feature flags are a modern web development paradigm that has slowly been replacing A/B testing tools for many companies.
For money-is-no-object options, Optimizely is hard to beat.
At this time for A/B testing...
I recommend we choose between:
Google Optimize
Convert.com
If I were to choose for myself it would be to start with LaunchDarkly and see if it's a viable option. This will give us benefits beyond just A/B testing.
In the event we need a conversion WYSIWYG for smaller changes I recommend convert.com. I don't see enough need at the moment to spend on an overly expensive tool like Optimizely. We may not even have enough need for convert.com and may want to use with Google Optimize instead.
Also @shanerice any thoughts or insights about my above analysis, particularly regarding Google Optimize vs Optimizely since I know GitLab has a history with those two?
We are limited to only using CSS/JS testing with Optimizely. This caused performance problems when tests had large changes. The work around for this required the ability to deploy unique test pages, and this is something we couldn't do with about.gitlab.com at the time. I think this may have been related to a security requirement on our side, but not 100% sure.
What do I mean by larger test? At one point our previous paid agency built a new free trial page for us to test, and it loaded slowly. They suggested moving to a redirect test that sent people to a /free-trial/a kind of URL and that wasn't possible on our side.
Optimize is fine, but the default deployment caused the site to load slowly for people internally. The tests I've run with it have been small and targeted, but my assumption is we'd see similar performance issues to Optimizely with larger tests.
Depending on the scale of tests, we're going to need a developer to help build elements of most tests.
A few questions I'd love your input on @brandon_lyon
Could we use any of the SaaS options to deploy branch-based testing?
Could we set up a unique instance of a CMS for testing?
Would it make sense to create a hybrid approach, using a SaaS product for simpler tests and building larger tests with branch-based testing?
Can we use feature flags with whatever solution we use?
Agreed, large scale changes don't work well with javascript/css based SaaS solutions and they have suboptimal performance.
I think a hybrid approach makes sense as outlined above. Your experience with regards to small vs large changes adds clout to the hybrid approach too.
Branch-based tests would work. That's more along the lines of CDN or feature flag option. The difficulty there would be in integrating any other type of test at the same time like Google Optimize since I'm not sure how server load-balancing would take those into account.
Yes feature flags can work with whatever solution. Some of the solutions have it built-in but some do not. Either way, they could coexist with a solution like LaunchDarkly.
The difficulty there would be in integrating any other type of test at the same time like Google Optimize since I'm not sure how server load-balancing would take those into account
I think we should probably limit tests to one method on any page at a given time.
Yes feature flags can work with whatever solution. Some of the solutions have it built-in but some do not. Either way, they could coexist with a solution like LaunchDarkly.
Do we have any sense for how hard it would be to setup feature flags on Cloudflare? @laurenbarker do you know if the static site team is considering anything related to feature flags on their roadmap?
A feature flag just says "turn on or off this part of the site without releasing new code". This could be full pages or parts of a page depending on configuration.
Feature flags within our static-site setups would be more of a replacement for the CDN than they would work with it. In Fastly and Cloudflare it is possible to implement load-balancer based A/B testing but there would be more manual configuration and code complexity as outlined above.
Feature flags as implemented in LaunchDarkly are essentially the same load-balancer based CDN setups I described above but automatically created via GUI instead of manual infrastructure code-based things.
GitLab product can use GitLab feature flags which can coexist with the CDN infrastructure because they don't precompile static assets in the same way we do.
Im not sure if feature flags are going to be relevant for our A/B testing unless they are used for compiling different versions of the site. @brandon_lyon@shanerice
Seems like routing A/B testing though Cloudflare/Fastly would be the JAMstack way of doing this. This is how Netlify does it, and I recommend we use a similar approach @brandon_lyon
Note I think we're going to be testing between full builds of our site. I imagine the server looking something like this and our CDN deciding what versions of the site users get served with.
/public /version-a full build of about.gitlab.com - version A /version-b full build of about.gitlab.com - version B
@laurenbarkerLaunchdarkly feature flags are a frontend in front of their own Fastly CDN, doing similar things to what you describe but without the branch aspect.
The difficulty with branch-based testing is that each branch will not get integrations from the rest of the site, ie someone stuck on a certain branch won't be seeing updated handbook documentation or event notifications, etc, unless that branch is frequently updated, or until the end of the test.
Feature flags, on the other hand, all take place on the same branch. Versions of a tested item are essentially wrapped with a giant if-else statement. This item could be a shared component, an entire page, or multiple pages. The branch would always be up-to-date. An added benefit is that large changes can be fully micro merged into master and released to production without being sent traffic, preventing a messy giant merge later.
Let's take a step back and outline the types of tests we might run and what their requirements/difficulties are in order to further evaluate which solution might be good for each type of test.
What kind of things will we test, conceptually/overall?
I began by outlining tools. Then I took a step back and said "we should outline what we want to do with the tools first". Let's take one step farther back and ask...
Who are these tools for?
At this time the answer is primarily "people who can code and understand UX and good design", ie Brandon, Stephen, Lauren, and possibly the growth team.
This means that at this time we probably don't need the WYSIWYG, though it would be nice to have. We also already have the GitLab WebIDE and will eventually we will have a CMS. Do we really need a WYSIWYG? Probably not at this time.
Who are these tools NOT for?
At this time the answer is primarily "people who don't understand, UX and good design". The likelihood that we need a WYSIWYG which can change things like color and typography without coder intervention is low.
@brandon_lyon I realize this is high-level planning but when we start getting further into the process let's talk more about how we can differentiate the results in Google Analytics.
I've used unique CSS classes in tests to make it simpler to report results form GA, but this may not be a need if we have unique URLs
Per zoom discussions, this plan is generally supported and we can move forward with implementation. I've created an epic for tracking implementation issues &290 (closed)