Develop a strategy for using feature flags in e2e tests

Problem

We don't yet have a coherent strategy for using feature flags in e2e tests. As a consequence, we sometimes run into situations like this:

An engineer enables a feature flag on Staging as part of a feature flag rollout.
An end-to-end (QA) test fails during the next Staging test run (which may be hours later).
A Quality pipeline triage DRI notices the failure, or someone from teamDelivery notifies Quality that a failing test has blocked deployment.
The pipeline triage DRI investigates the failure and tries to troubleshoot the issue, involving any available engineers from the team responsible for the feature.
Eventually they discover that the test failed due to the newly enabled feature flag, and determine that the test needs to be updated.
The failing test is put in quarantine or the feature flag is disabled on staging to unblock deployment.
The relevant Quality counterpart SET is notified that the test needs to be updated.

This causes unnecessary disruption for the Delivery and Engineering teams, extra work for the Quality pipeline triage DRI, as well as unscheduled work for the counterpart SET.

Proposal

Step 1 - Update Quad-Planning Documentation

We should update the quad-planning handbook page to note that the planning process should include a checklist item to confirm if tests need to be updated because of feature flag usage.

This will help ensure we're aware of the need for a test update as early as possible, and avoid causing disruption later in the dev cycle.

Update the docs: gitlab-com/www-gitlab-com!88542 (merged)

Step 2 - Manually determine when tests need to be updated before they block deployment

gitlab-org/gitlab!51178 (merged) documents how to confirm that end-to-end tests pass with a feature flag enabled before the flag is enabled on Staging or GitLab.com.

This is intended as a brief iteration before an automated process is put in place. The aim is to involve as few people as necessary, with as little disruption as possible while we work on an automated process.

We should also clarify how the documentation in gitlab-org/gitlab!51178 (merged) concerns avoiding blocking deployments because of feature flag changes, but merely following that documentation isn't sufficient testing.

Step 3 - Automatically determine when tests need to be updated

As @ayufan noted:

...we know when a default state of feature flag changes, because of the introduction or modification of YAML file. Our pipeline can detect and run QA with FF enabled and disabled as part of development merge request automatically via the package-qa... job.

We could implement that using @grzesiek's suggestion of a QA_FEATURE_FLAGS environment variable that would prompt GitLab QA (via package-and-qa) to run the tests with the specified feature flag set (true/false). - mlapierre working on https://gitlab.com/gitlab-org/quality/team-tasks/-/issues/987

Further considerations

There are several ways that we can enable and disable feature flags, each with their pros and cons. Here's some relevant comments from @grzesiek:

From what I understand this can be done in at least three different ways:

Modify an end-to-end test and set a feature flag explicitly there

Modify your feature and make a feature flag enabled by default

Set a feature flag explicitly on the instance (presumably GDK or GCK) before running end-to-end tests against such instance.

... None is perfect, there are pros and cons everywhere, like:

Modifying an end-to-end tests will not test a default behavior (default_enabled: setting) unless you add two different context to check each case. Am I correct here?

Modifying your feature and setting default_enabled: true just for the sake of running a test, will be only a one-time check, not reproducible until a feature flag is toggled on staging / production. Am I correct here?

Setting a feature flag explicitly on an instance overrides default settings in your local development environment leading to a configuration drift.

I think those observations are correct, so a complete strategy should account for them.

Also, in some cases we might want to enable a feature flag before running a whole suite of tests.

Run E2E tests with all feature flags enabled by default.

All feature flags are enabled automatically in unit/feature specs, but we don't do that in E2E tests. This has caused some confusion, and we shouldn't need to rely on documentation that can easily be missed.

~~- [ ] Build feature into GitLab to allow updating multiple feature flags at a time.~~ ~~- [ ] Run E2E tests with all feature flags enabled by default.~~

[Edit: Decided not to go ahead with this because it would result in too much noise and would be hard to figure out what causes any failures]

Trigger full suite of QA tests when a feature flag is changed. (@sliaquat working on this)

We send a Slack notification to a QA channel when a feature flag is changed. However, to run the QA tests we rely on scheduled pipelines, or ones that are triggered by a deployment. That may be sufficient in some cases, but it could result in a substantial delay before QA test results are generated and reviewed. Especially since pipelines that include the full suite run less frequently than smoke tests, and the changes behind the feature flag might not be covered by smoke tests.

Tasks:

Trigger the full suite of QA tests against the relevant environment when a feature flag is changed and include the pipeline URL in the message sent to the relevant QA channel => MR
Send Slack user id to ChatOps Jobs => MR
Send chat service's user id as CHAT_USER_ID to triggered pipelines so that it can be later used to @mention the user in case of a failure => MR
Do not trigger end-to-end tests when a feature flag is toggled for a specific group or a project. => MR
@mention the user on Slack when an end-to-end test triggered due to their FF toggle fails. => MR
Enable TRIGGER_E2E_TESTS env variable on the ChatOps project.

Edited Oct 21, 2021 by Sanad Liaquat