Misleading error message when Active Record encryption keys are missing during upgrade

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

  • Label this issue

Summary

When Active Record encryption keys are missing from the gitlab-rails-secret during a GitLab upgrade (particularly from 17.x to 18.x), users encounter a misleading error message that suggests a filesystem/storage problem. The actual root cause—missing encryption keys—is not clearly communicated, leading users to waste time troubleshooting the wrong issue.

Steps to Reproduce

  1. Deploy GitLab with Helm chart that's missing Active Record encryption keys in gitlab-rails-secret
  2. Attempt to start GitLab webservice or sidekiq pods
  3. Observe the error message in pod logs

Current Behavior

Users see this error:

rake aborted!
Errno::EBUSY: Device or resource busy @ apply2files - /srv/gitlab/config/secrets.yml

Caused by:
Errno::EXDEV: Invalid cross-device link @ rb_file_s_rename -
(/srv/gitlab/config/secrets.yml, /srv/gitlab/tmp/backups/secrets.yml.orig.1770308315)

What this error suggests to users:

  • Filesystem issue
  • Storage problem
  • Cross-device link error (tmpfs/mount issue)
  • Device resource exhaustion

What users do:

  • Investigate storage configuration
  • Check filesystem mounts
  • Review volume configurations
  • Troubleshoot Kubernetes storage
  • Spend hours on the wrong problem

Expected Behavior

The error message should:

  1. Clearly state that Active Record encryption keys are missing
  2. List the missing keys:
    • active_record_encryption_primary_key
    • active_record_encryption_deterministic_key
    • active_record_encryption_key_derivation_salt
  3. Explain why these keys are required (GitLab 17.8+)
  4. Provide guidance on how to fix it:
    • Generate the keys
    • Add them to gitlab-rails-secret
    • Restart the pods
  5. Link to documentation about encryption key requirements

Root Cause

The initialization script attempts to backup and modify secrets.yml when encryption keys are missing. Since secrets.yml is mounted as a Kubernetes secret (tmpfs), and /srv/gitlab/tmp is on a different filesystem, the atomic rename operation fails with Errno::EXDEV.

However, the real issue is that the encryption keys should have been present in the secret from the start. The filesystem error is a symptom, not the cause.

Impact

  • Users cannot diagnose the actual problem from the error message
  • Troubleshooting time is wasted on filesystem/storage investigation
  • Upgrade process is blocked with unclear guidance
  • Particularly affects users upgrading from GitLab 17.x to 18.x with Helm deployments
  • Related to #591430 where the shared-secrets hook fails to update existing secrets

Affected Versions

  • GitLab 17.8+ (when Active Record encryption was introduced)
  • Particularly affects users upgrading from pre-17.8 versions

Possible Fixes

  1. Add validation before attempting to initialize encryption keys:

    • Check if keys exist in secrets.yml
    • If missing, raise a clear error with actionable guidance
    • Don't attempt to modify the file if keys are missing
  2. Improve error message to include:

    • Specific list of missing encryption keys
    • Explanation of why they're required
    • Step-by-step instructions to add them
    • Link to documentation
  3. Add pre-flight checks during startup:

    • Validate all required encryption keys exist
    • Fail fast with clear messaging
    • Prevent attempting filesystem operations that will fail
  4. Update documentation to explain:

    • Encryption key requirements for v17.8+
    • How to generate keys
    • How to add them to gitlab-rails-secret
    • Troubleshooting guide for missing keys

Related Issues

  • #591430 - Helm shared-secrets hook doesn't update existing secrets during chart upgrades, leaving encryption keys missing

This issue is the symptom of #591430. When the shared-secrets hook fails to update the secret with new encryption keys, users encounter this misleading error message.

Edited Feb 25, 2026 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading