Automated (pipeline) test planning for maintenance mode

Update 1/19: I've created a separate issue for (non-automated) feature testing leading up to the release. I've moved two of the deliverables to that issue

Original epic

New end-to-end tests would check that:

Users cannot write data while in read-only mode, and regain write access when read-only mode is turned off. This would be tested for each data type (For git repos, this includes git pushes redirected (HTTP) or proxied (SSH) from secondary)
Users can login to secondary node when primary is in read-only mode
Admins can perform certain admin-related (write) functions while in read-only mode:
- Edit application settings via UI, particularly turning off maintenance mode, editing maintenance mode message
- (If applicable) Can test write functions before turning off maintenance mode for rest of users

Should these be UI-based or API-based tests?

I think a combination of UI and API tests would be useful. Would be useful to know which actions users do through the command line versus the browser

UI tests to check messaging around maintenance mode being enabled/disabled
API tests (if possible) if we want to test several types of read/write requests One note about API tests: we used Postman to do some API tests on staging, and the HTTP status codes were not what we expected nor what we saw using cURL requests

Where should maintenance mode end-to-end test code live?

Maintenance mode is not stage-specific or Geo-specific, so tests could live in a common folder like qa/qa/specs/features/ee/api/maintenance_mode and qa/qa/specs/features/ee/browser_ui/maintenance_mode
Note that maintenance mode tests requiring Geo deployment should be tagged with metadata :geo
If tests live in different folders, they should have a meta tag :maint_mode

Should maintenance mode tests run in its own pipeline job (e.g. ee:maintenance_mode)?

My first instinct is that these tests should run in a dedicated pipeline job. If maintenance mode tests run in a job with other types of tests, and the maintenance mode test fails to disable maintenance mode, it would impact the rest of the tests. But this might be handled sufficiently by :after contexts to disable maintenance mode.
Some tests will require a Geo setup. Could run all maintenance mode tests against a Geo setup, so both non-Geo and Geo maintenance mode jobs are run together and separate from other tests.

How should test data be set up for maintenance mode tests?

Maintenance mode tests need to set up any required test data before enabling maintenance mode (because you can't write anything in maintenance mode).
Also need to make sure maintenance mode is disabled at the end of each test (via API)

Which pipelines/environments should automated end-to-end maintenance mode tests run in?

Job could run in orchestrated environments, non-shared live environments (e.g. built with GET)
NOT on shared live environments like staging, pre-production, canary, production

Other notes

This is something we tested before maintenance mode was released, and may want to consider for automated testing:

Enabling maintenance mode should not impact write actions that are in-progress. Database write requests triggered before maintenance mode is enabled should still complete; Geo replication and syncing should continue

Deliverables for this issue:

List of automated end-to-end tests to be written. Should every replicable data type be tested in maintenance mode?
Plan for when and how these end-to-end tests should be executed in our QA pipelines. Should it have its own job separate from ee:geo? Should it run in non-Geo test environments (excluding any secondary site steps?)

Edited Mar 17, 2021 by Jennifer Louie