Address Reset Column Information required in Data Migration Models

Problem

With our current approach to building data migrations, we require that migration code is isolated, i.e. we can not rely on existing models while working on a data migration.

Background migrations must be isolated and can not use application code (e.g. models defined in app/models). Since these migrations can take a long time to run it’s possible for new versions to be deployed while they are still running.

This is true for background migrations, but we also follow that rule for all data migrations, including post-deployment ones.

That means that migration authors have to replicate existing functionality on the migration class, by redefining all the used Models as ActiveRecord::Base classes.

That keeps the migration code isolated, but creates a new problem: all migration classes are loaded at the beginning (when db:migrate starts), so they can get out of sync with the table schema they map to in case a migration updates that schema.

That makes the data migration fail when trying to update the underlying table, as the new columns are reported as unknown attribute

The solution we have found for this problem is to reset the column information by using reset_column_information in the up method of the migration for all the Models that update tables.

Problem showcase

I am going to use as an example the CleanupProjectsWithMissingNamespace data migration that was introduced in !31675 (merged).

It has to do a couple of things, including creating the ghost user and its namespace, add a new group with the ghost user as the owner and then move all orphaned projects (ones with an invalid namespace_id) under it.

As it is evident, it touches a lot of tables, so it has to redefine classes for User, Namespace, Group, Member and Project.

The problem described is not evident while just running the latest migrations. It can only be reproduced when all the migrations run from scratch, by dropping the db and starting over.

Way to reproduce: if you remove the required reset_column_information from the CleanupProjectsWithMissingNamespace post-deployment migration and you then run all the migrations, you will get the following:

$ bundle exec rake db:drop RAILS_ENV=development
$ bundle exec rake db:create RAILS_ENV=development
$ bundle exec rake db:migrate RAILS_ENV=development

... ... ... [all available migrations running] ... ... ...

== 20200511083541 CleanupProjectsWithMissingNamespace: migrating ==============
rake aborted!
StandardError: An error has occurred, all later migrations canceled:
unknown attribute 'user_type' for CleanupProjectsWithMissingNamespace::User.
... ...
../gitlab/db/post_migrate/20200511083541_cleanup_projects_with_missing_namespace.rb:137:in `create_unique_internal'

The reason for that is that all migration classes are loaded at the beginning and, in this case, if we check the migrations log, we add the user_type later (20200304085423_add_user_type.rb). The column cache in the User model is outdated so it needs to be reset to make sure that everything works as expected.

What is described above is even more evident if we run db:migrate a second time after the first run fails. As the migration that altered the schema has already run and the data migration that was encountering the conflicts is the first one to run, the correct column information for all Models will be loaded and it will succeed without issues.

A minor note here is that explicitly referencing the class (e.g. CleanupProjectsWithMissingNamespace::User) does not solve the problem. We have to reset the column cache for the referenced table.

An older version of the Active Record Migrations guide discusses this a little bit more thoroughly, but as we can see from our own migrations, it is still true.

Solution

This is a problem encountered when updating data while using Models, not when selecting data, so we have not encountered it many times.

We can never be sure while working on a migration whether another migration will prepend it later in the release/development cycle (i.e. use a smaller version) and update the schema of a table that is accessed by a data migration through custom ActiveRecord::Base classes

That could cause issues to our environments if the two migrations are picked to be deployed together and will 100% cause issues for anyone running the migrations from scratch. We thankfully have tests to catch this and retrying the migrations will solve the problem, but it is nice to have a proper solution with the migrations never failing and causing concern to whoever runs them.

To be on the safe side, when defining custom Models in migrations, we should reset the column information for all the models that we use to update/insert data.

That's the reason why we reset the column information of all those models, while at the time the CleanupProjectsWithMissingNamespace migration shipped, it only conflicted with the cached column information for User.

Way to address (two potential MRs)

As a first step, we should add that advice in the Isolation section of the Background Migrations guide.
As a more consistent, long term update, we can define a custom base class that resets the column information when used and require all migrations to use that rather than the ActiveRecord::Base class.

Edited Jul 14, 2020 by Yannis Roussos