Add Rails/Sidekiq version consistency check as part of post-deployment validation for multi-node environments

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Proposal

After a multi-node GitLab environment is deployed or upgraded, include a check (manual or automated) to get application versions across Rails and Sidekiq services and alert users if there's a mismatch in the environment.

This would greatly speed up identification of root cause and remediation when errors occur after a deployment.

Example failure scenarios

Here are some cases which Customer Support has encountered, where having a version consistency check would have helped identify the root cause, or perhaps even enable sysadmins to self-identify and self-resolve the problem before reaching out to support.

Did not upgrade a Rails or Sidekiq node

GitLab package was upgraded on some, but not all nodes. Some nodes continue to serve the previous application version continues, resulting in errors (typically NoMethodError and other similar code-schema mismatch problems).

Did not restart a Rails or Sidekiq node

The updated GitLab packages were installed, but the Rails or Sidekiq nodes were not restarted on one or more nodes. The previous application version continues to be served, resulting in errors (typically NoMethodError and other similar code-schema mismatch problems).

Rogue Sidekiq node in the environment

Sometimes, an old and forgotten (!) Sidekiq server continues to take jobs from Redis, but errors out due to code-schema mismatch problems. This results in inconsistent behaviour in the UI due to background jobs not completing.

Edited by 🤖 GitLab Bot 🤖