Smoke/Load tests in production
Often times it is helpful to run more thorough tests in the production environment, that are safe to do so. Checking the responsiveness of page loads and ensuring they function is helpful, but some of the most important workflows often involve manipulating data and taking more substantive actions.
If we are able to include such tests, we have a much better chance to detecting problems, before or as soon as users start finding them.
Ideally we would be able to leverage much of the scripted tests and other actions that were built as part of our load testing support (https://gitlab.com/gitlab-org/gitlab-ee/issues/3016). If needed some of these test suites could be specifically flagged as "safe" for production, and then executed here as part of these tests.
This should both improve the quality of the tests that are written, and allow them to be re-used for additional value. Typically with load tests, you want to be performing more than just simple page loads anyway. For example, E-Commerce sites should be testing the whole end to end workflow: browsing for items, adding them to a cart, checking out, paying, etc.
These tests can then serve as the proverbial canary in the coal mine, detecting problems within workflows that simple page load tests would not.
A potential solution could look like:
- Add a flag to indicate which load tests are safe to run in production
- Generate a CI job to execute these against production, periodically over time
- Alarm if one of these tests fail X times in a row