Misleading error message when Active Record encryption keys are missing during upgrade
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
When Active Record encryption keys are missing from the gitlab-rails-secret during a GitLab upgrade (particularly from 17.x to 18.x), users encounter a misleading error message that suggests a filesystem/storage problem. The actual root cause—missing encryption keys—is not clearly communicated, leading users to waste time troubleshooting the wrong issue.
Steps to Reproduce
- Deploy GitLab with Helm chart that's missing Active Record encryption keys in
gitlab-rails-secret - Attempt to start GitLab webservice or sidekiq pods
- Observe the error message in pod logs
Current Behavior
Users see this error:
rake aborted!
Errno::EBUSY: Device or resource busy @ apply2files - /srv/gitlab/config/secrets.yml
Caused by:
Errno::EXDEV: Invalid cross-device link @ rb_file_s_rename -
(/srv/gitlab/config/secrets.yml, /srv/gitlab/tmp/backups/secrets.yml.orig.1770308315)
What this error suggests to users:
- Filesystem issue
- Storage problem
- Cross-device link error (tmpfs/mount issue)
- Device resource exhaustion
What users do:
- Investigate storage configuration
- Check filesystem mounts
- Review volume configurations
- Troubleshoot Kubernetes storage
- Spend hours on the wrong problem
Expected Behavior
The error message should:
- Clearly state that Active Record encryption keys are missing
-
List the missing keys:
active_record_encryption_primary_keyactive_record_encryption_deterministic_keyactive_record_encryption_key_derivation_salt
- Explain why these keys are required (GitLab 17.8+)
-
Provide guidance on how to fix it:
- Generate the keys
- Add them to
gitlab-rails-secret - Restart the pods
- Link to documentation about encryption key requirements
Root Cause
The initialization script attempts to backup and modify secrets.yml when encryption keys are missing. Since secrets.yml is mounted as a Kubernetes secret (tmpfs), and /srv/gitlab/tmp is on a different filesystem, the atomic rename operation fails with Errno::EXDEV.
However, the real issue is that the encryption keys should have been present in the secret from the start. The filesystem error is a symptom, not the cause.
Impact
- Users cannot diagnose the actual problem from the error message
- Troubleshooting time is wasted on filesystem/storage investigation
- Upgrade process is blocked with unclear guidance
- Particularly affects users upgrading from GitLab 17.x to 18.x with Helm deployments
- Related to #591430 where the shared-secrets hook fails to update existing secrets
Affected Versions
- GitLab 17.8+ (when Active Record encryption was introduced)
- Particularly affects users upgrading from pre-17.8 versions
Possible Fixes
-
Add validation before attempting to initialize encryption keys:
- Check if keys exist in
secrets.yml - If missing, raise a clear error with actionable guidance
- Don't attempt to modify the file if keys are missing
- Check if keys exist in
-
Improve error message to include:
- Specific list of missing encryption keys
- Explanation of why they're required
- Step-by-step instructions to add them
- Link to documentation
-
Add pre-flight checks during startup:
- Validate all required encryption keys exist
- Fail fast with clear messaging
- Prevent attempting filesystem operations that will fail
-
Update documentation to explain:
- Encryption key requirements for v17.8+
- How to generate keys
- How to add them to
gitlab-rails-secret - Troubleshooting guide for missing keys
Related Issues
- #591430 - Helm shared-secrets hook doesn't update existing secrets during chart upgrades, leaving encryption keys missing
This issue is the symptom of #591430. When the shared-secrets hook fails to update the secret with new encryption keys, users encounter this misleading error message.