Setup a mixed deployment test environment
Occasionally, changes are introduced in GitLab-Canary that are incompatible with GitLab-Stable-Production.
For example, an encryption library was updated that started storing CI build tokens in a different format in the database. A job was created through Gitlab-Canary. When the runner attempted to communicate with GitLab later in the process, it's traffic was directed to GitLab-Stable-Production, which did not have the new encryption library and could not decrypt the token stored in the database. This started returning unexpected errors to Runner. Initially thought a Runner issue, the actual root cause was within the GitLab application itself.
It seems these incompatibilities will most likely occur in asynchronous processes where data is persisted in some format. Either the method of writing the data or the method of reading the data will have changed, creating inconsistency and possible failure between different versioned GitLab nodes. This occurs when a feature is implemented that fails to follow the expand and contract pattern detailed in our development documentation.
- Ensure new developers are consistently trained to observe the
expand/contractpattern mentioned above
- Provide follow-up training as reminders for developers after set periods of time on this and other important patterns
Execute a suite of API-only tests for critical read/write API endpoints. Conduct these tests in both directions: write from a
Canary version of GitLab then read from a
Stable-Production version. Repeat the tests in the opposite direction to ensure compatibility.
Environment option 1:
Setup a QE API test scenario that follows the same pattern as existing Geo tests that instantiate two instances of GitLab in a high capacity runner and execute the above mentioned tests.
- Similar pattern to existing Geo tests
- Likely susceptible to the very same flakiness that is a consistent problem in trying to execute two GitLab instances in the same VM environment
Environment option 2:
Setup a separate mixed deployment environment in a new QE GCP project where we have greater control of resources for running two GitLab instances.
- Less likely to deal with flakiness due to resource issues
- Additional tooling required to manage GCP project setup/teardown (GET could be leveraged to some degree but additional environment management outside of its current scope would still be required)