Which schema version should be used for background migration tests?
Background: !107791 (comment 1221187613)
Follow up for this issue: #388456
Too long, didn't read
Background migration tests should always run against the latest schema, and it should not be able to pick an older schema.
How do we pick schema version for background migration tests?
We do it like this to pick version 20221110045406
:
RSpec.describe Gitlab::BackgroundMigration::SanitizeConfidentialTodos, :migration, schema: 20221110045406 do
# ...
end
If we omit it like this:
RSpec.describe Gitlab::BackgroundMigration::SanitizeConfidentialTodos, :migration do
# ...
end
It'll get the default :latest
schema based on spec/spec_helper.rb:117
and based on spec/support/helpers/migrations_helpers.rb:126-127
it'll get to latest, which is the current schema.
What is the problem of sometimes we use latest and sometimes not?
The default :latest
makes sense because we run background migrations over a period after we run the regular migrations. This works when the background migration is written, until it's not, because latest schema is a moving target which will change over time, it'll break if we break the assumption of the schema at the time when the background migration was written.
When this happens, there are 3 things we can do (not all of them can be applicable, it depends on what breaks):
- Fix the background migration (when that's where it's broken) to adapt to the current latest schema
- Fix the test (when test data broken) to adapt to the current latest schema
- Pick a schema version which the background migration still works
If we want to support the background migration, we should not pick a schema version. However, the background migration can be so old that might be 1 year ago, which is like 10+ milestones away. Fixing those might not be trivial and if we just pick a schema version for the test that would be a much faster way to fix the test when it fails.
The background migration can still be working even when the test fails/breaks, because it can be the test was broken that it's not getting the assumption of the current schema right, rather than the background migration itself.
However, this caused numerous master:broken and failureflaky-test that sometimes it's very hard to diagnose, because it can cause side effects. gitlab-org/quality/engineering-productivity/master-broken-incidents#359 (comment 1201659042) was a recent case that it's difficult to track down.
We can probably say the test was not properly written, but we can also see that the schema version picked for that particular test was changed multiple times (see the full background at !106417 (merged)). I think this showed that we need to change how we approach this.
Proposed new 3 things we can do when background migration is broken
In the previous section, it's mentioned that the following 3 things are what we do:
- Fix the background migration (when that's where it's broken) to adapt to the current latest schema
- Fix the test (when test data broken) to adapt to the current latest schema
- Pick a schema version which the background migration still works
I am proposing we change that to this 3 (the first 2 are the same)
- Fix the background migration (when that's where it's broken) to adapt to the current latest schema
- Fix the test (when test data broken) to adapt to the current latest schema
- Delete the background migration tests, when it's too old and we no longer need to support it
- If we also delete the background migration itself, we also need to consolidate the schema so that the (post) migration using the background migration will not refer to the background migration
This means we no longer allow background migration tests to pick schema versions, because it does suppose to run with the current schema anyway. If it picks an older one, we don't know if it still works with the latest.
We need to define what is being "too old", too, and if it's causing master:broken and we can't identify what to revert, and we need to fix it quickly, we can probably just quarantine it.
If this is not desirable, then I'll propose this alternative which has only one choice:
- Always pick a schema version which the background migration will work
This is to make sure we're not introducing master:broken and failureflaky-test which are hard to tell how exactly we should fix the problem. Those broken background migrations can be so old that it's not practical to actually fix them quickly to stop master:broken.
Too long, didn't read
Background migration tests should always run against the latest schema, and it should not be able to pick an older schema.