Rails: wait-for-deps race-condition waiting for migrations

Summary

Rails based containers using wait-for-deps may not be waiting appropriately, due to what appears to be a race condition.

Details

Discovered in Dedicated, wait-for-deps is subject to a race condition which means it can sometimes (possibly more often than not) fail to actually wait for migrations before releasing the Rails workload to run, leading to the expected sort of errors (Rails processes not knowing about new fields etc).

It is caused by the recent MR which changed to load the list of expected schema migrations from disk. It works fine in isolation, but https://gitlab.com/gitlab-org/build/CNG/-/blob/0d7b2b071be2d3eb2772051f02550f501845d06c/gitlab-rails/scripts/lib/checks/postgresql.rb#L112 uses SCHEMA_VERSIONS_DIR which wait-for-deps has set to a relative directory. So if the redis check has executed this line first, the current directory will be /srv/gitlab/config because chdir is not thread safe. Then db/migrate will not exist as a subdir, codebase_schema_versions will return an empty array, and pending_migrations will always be empty (an empty set minus any other set is always an empty set).

In practice, the redis code is much more likely to execute first because the postgresql code has to make a connection to the DB before it goes looking at the config files, so I'd actually wager this is a race condition that wait-for-deps is destined to lose.

A possible solution (worked in my initial testing) is to set $SCHEMA_VERSIONS_DIR to an absolute path, not a relative path; I'll construct an MR shortly.

Internal investigation: https://gitlab.com/gitlab-com/gl-infra/gitlab-dedicated/team/-/issues/6300#note_2128175207

Edited by Jason Plum