Investigate Using Lock Files to Preserve Data Consistency During Self-Managed Imports
Context
As part of Design an In Place Migration Procedure for Self... (#884 - closed) we described a three-step import for self-managed instances moving to database metadata. In particular, data consistency was identified as a special area of concern.
Problem
Currently, the import tool does not ensure data consistency. The user must manage their import procedure and reconfiguration of the registry appropriately to ensure safe data access. We have a similar issue with offline garbage collection, there is no mechanism to prevent writes during garbage collection, which could result in inconsistent data. Therefore, if we do not provide additional safety features, we are not asking self-managed admins to take on a role or responsibility that is beyond what is already expected of them.
However, unlike offline garbage collection, the import will touch significantly more container registry data. In particular, the import process is concerned with preserving referenced data which the user would like to keep, which is the inverse of what the offline garbage collector does. This heightens the potential consequences of unsafe data access occurring during import.
Ideal Behavior
Thinking in terms of ease of use, the import tool to handle as much of the data consistency operations automatically. This will reduce the number of steps the user has to engage in during the happy path, both in terms of getting the order wrong, and it terms of forgetting to do them. We've also seen some users say they run offline garbage collection without taking the registry down, this isn't a good idea with garbage collection, but it is a significantly worse idea with the import. Therefore, it would also prevent a certain "fast and loose" attitude from carrying over to the import process.
Lock Files
Without the metadata database, or a shared cache like Redis, the only mechanism we have to coordinate registry behavior across processes is the object storage backend. It should be possible to write "lock files" which the registry can read to help enforce safe behavior. Object storage has severe limitations, particularly the lack of read after write consistency in all storage backends. However, we can anticipate low enough write and read activity across these files that this is not a significant concern.
We should be aware that these file will inevitably be left on the filesystem when they should have been removed automatically. Therefore, as much as possible we should try to ensure:
- removing a lock file restores the "default" behavior of the registry
- actions prevented via a lock file must result in a log message that clarifies why and how the admin can remove the file to restore functionality
Monitoring
In order to properly influence registry behavior for running registry processes, we'll need to spawn health check like goroutines that check for the existence of these files. We'll also need to pause long enough for these periodic checks to happen, much as we did with the imports for .com., such as here.
Application Areas
Import Phase
The files could ensure that for stepped imports, each phase is only executed after the successful completion of the previous phase. We could also use the database in this instance, so we have some flexibility here.
Read-Only Management
These files could be used to dynamically signal running registry instances to switch to read-only mode if they're not configured as such.
While the registry was not designed to do something like, reconfigure on catching a HUP
, the read-only mechanism for HTTP requests is dynamically evaluated at request time.
The catch here is that we would still be running upload purging, this won't affect the import, but it would be a departure from the behavior of a registry that came online configured in read-only mode.
Failed Import Lock
This lock would be placed if a migration failed and would prevent the metadata database from being enabled. This would serve to ensure that a database with partially or incorrectly imported metadata is not accidentally used to serve requests.
Metadata Database Enabled
This lock file would serve to prevent registries which have been run with the metadata database from starting if the database is not enabled. This would prevent accidental misconfigurations (or rollbacks) from accidentally writing metadata to object storage. We should not need a health check process here, since registry configuration does not change after startup.
The importer would write this file after the end of step 2 and the import has been validated, or conversely, fresh installations with the database would write this file on first use.