fix: gracefully handle inconsistent lockfile states from REGISTRY_FF_ENFORCE_LOCKFILES misuse

Problem Summary

The container registry's REGISTRY_FF_ENFORCE_LOCKFILES feature flag was incorrectly implemented in multiple critical code paths. Instead of controlling whether lockfiles are enforced, the flag was used to control whether lockfiles are managed at all. This has resulted in inconsistent lockfile states across many self-managed instances that have migrated to the metadata database.

Severity: S2 (High) - Workaround available via manual lockfile management or feature flag disabling


Workaround Instructions

Workaround Instructions for Lockfile Issues

If you're experiencing registry startup failures with the error "registry filesystem metadata in use, please import data before enabling the database" after migrating to the metadata database, you can use one of the following workarounds depending on your GitLab installation method.

Option 1: Set the Feature Flag (Recommended)

This prevents the lockfile enforcement issue and is the safest approach.

For Linux Package (Omnibus) Installations

Add the following to your /etc/gitlab/gitlab.rb file:

registry['env'] = {
  'REGISTRY_FF_ENFORCE_LOCKFILES' => false,
}

Then reconfigure GitLab:

sudo gitlab-ctl reconfigure

For Docker/Docker Compose Installations

Important: Setting the environment variable directly in Docker Compose does NOT work. You must configure it through gitlab.rb.

Create or edit your gitlab.rb configuration file and add:

registry['env'] = {
  'REGISTRY_FF_ENFORCE_LOCKFILES' => false,
}

Mount this configuration file in your Docker Compose setup and ensure GitLab reconfigures on startup.

For Helm Chart (Kubernetes) Installations

Add the following to your values.yaml:

registry:
  extraEnv:
    REGISTRY_FF_ENFORCE_LOCKFILES: "false"

Then upgrade your Helm release:

helm upgrade gitlab gitlab/gitlab -f values.yaml

Option 2: Manual Lockfile Removal

Only use this option if you are confident your registry is supposed to be using the metadata database.

Locate the Lockfile

The lockfile location depends on your storage configuration:

  • Local filesystem storage: </path/to/rootdirectory>/docker/registry/lockfiles/filesystem-in-use
  • S3 or object storage: The lockfile will be in your S3 bucket at the same path: docker/registry/lockfiles/filesystem-in-use

Remove the Lockfile

For local filesystem:

rm /path/to/rootdirectory/docker/registry/lockfiles/filesystem-in-use

For S3 storage:

Use the AWS CLI or your S3 management console to delete the docker/registry/lockfiles/filesystem-in-use object from your bucket.

Note: After removing the lockfile, restart your registry service.

Verification

After applying either workaround, verify the registry is functioning:

# Check registry status
sudo gitlab-ctl status registry

# Check registry logs
sudo gitlab-ctl tail registry

The registry should start successfully without the lockfile error.


What Went Wrong

The feature flag check was placed at the wrong level in several functions, causing an early return that prevented essential lockfile operations from executing:

1. Importer Lockfile Handling (!2648 (merged))

In the importer's handleLockers function, when REGISTRY_FF_ENFORCE_LOCKFILES was disabled (the default), the function would return early without:

  • Unlocking the filesystem-in-use lockfile after successful import
  • Locking the database-in-use lockfile to prevent registry from starting in filesystem mode

This resulted in imported registries with the filesystem-in-use lockfile present, but not the database-in-use lockfile, the opposite of the intended result.

2. Application Startup Lockfile Handling (!2649)

Similar issue during registry application startup - lockfiles weren't being managed when the feature flag was disabled.

3. Feature Flag Default Value (!2647 (merged))

The feature flag was set to false by default, but changed to true as part of the normal feature flag removal process. This exposed the bug. We're setting it to false again while the fixes are rolled out.

What We're Fixing

This issue tracks three related merge requests that address the lockfile management problems:

  1. !2647 (merged) - Disable REGISTRY_FF_ENFORCE_LOCKFILES by default
    • Changes the feature flag default to false
    • Prevents existing instances from encountering the issue
  2. !2648 (merged) - Fix importer lockfile handling
    • Removes the feature flag check from handleLockers
    • Ensures lockfiles are properly managed after imports complete
    • Fixes: filesystem lockfiles not being removed and database lockfiles not being set
  3. !2649 - Fix application startup lockfile handling
    • Ensures lockfiles are managed during app initialization regardless of feature flag state

Impact

Who is affected:

  • Any instance that has migrated to the metadata database while REGISTRY_FF_ENFORCE_LOCKFILES was disabled
  • Instances may have:
    • filesystem-in-use lockfiles that were never removed after migration
    • Missing database-in-use lockfiles
    • Inconsistent lockfile states that could cause startup issues

Customer report: See gitlab#423459 (comment 2903452155) for a specific example of this issue in production.

Recovery Strategy

The fixes in this issue focus on making the registry gracefully handle inconsistent lockfile states:

  • Detection: Registry should detect when lockfiles are in an unexpected state
  • Auto-recovery: Automatically correct lockfile states during startup when safe to do so
  • Logging: Provide clear logging when lockfile state corrections occur

The goal is to allow affected instances to recover without manual intervention while providing visibility into what corrections were made.

Action Items

  • Merge !2647 (merged) (feature flag default change)
  • Merge !2648 (merged) (importer fix)
  • Merge !2649 (app startup fix)
  • Implement graceful handling of existing inconsistent lockfile states
  • * Backport fixes to GitLab 18.6 by patching v4.31.0-gitlab

Related Issues and MRs

Merge Requests:

  • !2647 (merged) - fix(registry): disable REGISTRY_FF_ENFORCE_LOCKFILES by default
  • !2648 (merged) - fix(datastore): importer: manage lockfiles when REGISTRY_FF_ENFORCE_LOCKFILES is disabled
  • !2649 - Draft: fix(handlers): manage lockfiles on app start when REGISTRY_FF_ENFORCE_LOCKFILES is disabled

Related Issues:

Edited by Jaime Martinez