Add Rails/Sidekiq version consistency check as part of post-deployment validation for multi-node environments
Proposal
After a multi-node GitLab environment is deployed or upgraded, include a check (manual or automated) to get application versions across Rails and Sidekiq services and alert users if there's a mismatch in the environment.
This would greatly speed up identification of root cause and remediation when errors occur after a deployment.
Example failure scenarios
Here are some cases which Customer Support has encountered, where having a version consistency check would have helped identify the root cause, or perhaps even enable sysadmins to self-identify and self-resolve the problem before reaching out to support.
Did not upgrade a Rails or Sidekiq node
GitLab package was upgraded on some, but not all nodes. Some nodes continue to serve the previous application version continues, resulting in errors (typically NoMethodError
and other similar code-schema mismatch problems).
Did not restart a Rails or Sidekiq node
The updated GitLab packages were installed, but the Rails or Sidekiq nodes were not restarted on one or more nodes. The previous application version continues to be served, resulting in errors (typically NoMethodError
and other similar code-schema mismatch problems).
Rogue Sidekiq node in the environment
Sometimes, an old and forgotten (!) Sidekiq server continues to take jobs from Redis, but errors out due to code-schema mismatch problems. This results in inconsistent behaviour in the UI due to background jobs not completing.