fix: gracefully handle inconsistent lockfile states from REGISTRY_FF_ENFORCE_LOCKFILES misuse
Problem Summary
The container registry's REGISTRY_FF_ENFORCE_LOCKFILES feature flag was incorrectly implemented in multiple critical code paths. Instead of controlling whether lockfiles are enforced, the flag was used to control whether lockfiles are managed at all. This has resulted in inconsistent lockfile states across many self-managed instances that have migrated to the metadata database.
Severity: S2 (High) - Workaround available via manual lockfile management or feature flag disabling
Workaround Instructions
Workaround Instructions for Lockfile Issues
If you're experiencing registry startup failures with the error "registry filesystem metadata in use, please import data before enabling the database" after migrating to the metadata database, you can use one of the following workarounds depending on your GitLab installation method.
Option 1: Set the Feature Flag (Recommended)
This prevents the lockfile enforcement issue and is the safest approach.
For Linux Package (Omnibus) Installations
Add the following to your /etc/gitlab/gitlab.rb file:
registry['env'] = {
'REGISTRY_FF_ENFORCE_LOCKFILES' => false,
}
Then reconfigure GitLab:
sudo gitlab-ctl reconfigure
For Docker/Docker Compose Installations
Important: Setting the environment variable directly in Docker Compose does NOT work. You must configure it through gitlab.rb.
Create or edit your gitlab.rb configuration file and add:
registry['env'] = {
'REGISTRY_FF_ENFORCE_LOCKFILES' => false,
}
Mount this configuration file in your Docker Compose setup and ensure GitLab reconfigures on startup.
For Helm Chart (Kubernetes) Installations
Add the following to your values.yaml:
registry:
extraEnv:
REGISTRY_FF_ENFORCE_LOCKFILES: "false"
Then upgrade your Helm release:
helm upgrade gitlab gitlab/gitlab -f values.yaml
Option 2: Manual Lockfile Removal
Only use this option if you are confident your registry is supposed to be using the metadata database.
Locate the Lockfile
The lockfile location depends on your storage configuration:
-
Local filesystem storage:
</path/to/rootdirectory>/docker/registry/lockfiles/filesystem-in-use -
S3 or object storage: The lockfile will be in your S3 bucket at the same path:
docker/registry/lockfiles/filesystem-in-use
Remove the Lockfile
For local filesystem:
rm /path/to/rootdirectory/docker/registry/lockfiles/filesystem-in-use
For S3 storage:
Use the AWS CLI or your S3 management console to delete the docker/registry/lockfiles/filesystem-in-use object from your bucket.
Note: After removing the lockfile, restart your registry service.
Verification
After applying either workaround, verify the registry is functioning:
# Check registry status
sudo gitlab-ctl status registry
# Check registry logs
sudo gitlab-ctl tail registry
The registry should start successfully without the lockfile error.
What Went Wrong
The feature flag check was placed at the wrong level in several functions, causing an early return that prevented essential lockfile operations from executing:
1. Importer Lockfile Handling (!2648 (merged))
In the importer's handleLockers function, when REGISTRY_FF_ENFORCE_LOCKFILES was disabled (the default), the function would return early without:
- Unlocking the
filesystem-in-uselockfile after successful import - Locking the
database-in-uselockfile to prevent registry from starting in filesystem mode
This resulted in imported registries with the filesystem-in-use lockfile present, but not the database-in-use lockfile, the opposite of the intended result.
2. Application Startup Lockfile Handling (!2649)
Similar issue during registry application startup - lockfiles weren't being managed when the feature flag was disabled.
3. Feature Flag Default Value (!2647 (merged))
The feature flag was set to false by default, but changed to true as part of the normal feature flag removal process.
This exposed the bug.
We're setting it to false again while the fixes are rolled out.
What We're Fixing
This issue tracks three related merge requests that address the lockfile management problems:
-
!2647 (merged) - Disable
REGISTRY_FF_ENFORCE_LOCKFILESby default- Changes the feature flag default to
false - Prevents existing instances from encountering the issue
- Changes the feature flag default to
-
!2648 (merged) - Fix importer lockfile handling
- Removes the feature flag check from
handleLockers - Ensures lockfiles are properly managed after imports complete
- Fixes: filesystem lockfiles not being removed and database lockfiles not being set
- Removes the feature flag check from
-
!2649 - Fix application startup lockfile handling
- Ensures lockfiles are managed during app initialization regardless of feature flag state
Impact
Who is affected:
- Any instance that has migrated to the metadata database while
REGISTRY_FF_ENFORCE_LOCKFILESwas disabled - Instances may have:
-
filesystem-in-uselockfiles that were never removed after migration - Missing
database-in-uselockfiles - Inconsistent lockfile states that could cause startup issues
-
Customer report: See gitlab#423459 (comment 2903452155) for a specific example of this issue in production.
Recovery Strategy
The fixes in this issue focus on making the registry gracefully handle inconsistent lockfile states:
- Detection: Registry should detect when lockfiles are in an unexpected state
- Auto-recovery: Automatically correct lockfile states during startup when safe to do so
- Logging: Provide clear logging when lockfile state corrections occur
The goal is to allow affected instances to recover without manual intervention while providing visibility into what corrections were made.
Action Items
-
Merge !2647 (merged) (feature flag default change) -
Merge !2648 (merged) (importer fix) -
Merge !2649 (app startup fix) -
Implement graceful handling of existing inconsistent lockfile states -
* Backport fixes to GitLab 18.6 by patching v4.31.0-gitlab
Related Issues and MRs
Merge Requests:
- !2647 (merged) - fix(registry): disable REGISTRY_FF_ENFORCE_LOCKFILES by default
- !2648 (merged) - fix(datastore): importer: manage lockfiles when REGISTRY_FF_ENFORCE_LOCKFILES is disabled
- !2649 - Draft: fix(handlers): manage lockfiles on app start when REGISTRY_FF_ENFORCE_LOCKFILES is disabled
Related Issues:
- gitlab#423459 - Feedback issue: Next generation container registry rollout to self-managed
- gitlab#423459 (comment 2903452155) - Specific customer report of lockfile issues