Support for stale locks
In our use-cases of flufl lock, we encountered a couple ways a lock can become stale:
- We set a lock lifetime to be +12 hours. Process A abruptly shutdown and didn't clean up the lock. Another process tries to get lock and lock hasn't expired.
- The lockfile gets corrupted. A way I can think of this happening is if the filesystem itself gets corrupted due to some bad OS rollout or someone modifying the lockfile itself.
- Operating on the same lockfile with different user permissions.
I'm curious if there are plans to support cleaning up stale locks. A couple of things I'm considering as conditions to solve each individual cases of stale locks if the lockfile still exists:
Case # 1: Lock not owned by any process
Let's check from self.details if hostname matches or pid is still alive. If hostname is different or pid doesn't exist, then lock can be cleaned up. If pid exist, we may support a hack for users to specify if the process name matches some user description.
Case # 2: Lockfile gets corrupted
I think an easy way to catch this is just to call self.details. If we have any sort of exceptions, chances are, the lockfile is corrupted.
Case # 3: Permissions Issues
Try switching to root if we're sure the lock is stale based on any of the above or the lockfile has expired based on self._releasetime.
Potential Problems
A problem that came to mind is this can potentially have race conditions. We might be able to set the order of operations in such a way to minimize that (such as checking if the lockfile exists, check if the pid correspond to alive process first, or trying to grab lock again right before getting the lock). To summarize the issue briefly:
Lock is stale
Process A check if lock is stale
Process B check if lock is stale
Process A cleans up lock
Process A grabs the lock
Process B cleans up lock
Process B grabs the lock
Both Process A and Process B are inside code that's no longer atomic.
I'm not really sure how to resolve this issue yet. Trying to use another lock will then recreate the same issue with staleness for that lock. Curious about thoughts. Perhaps, instead of doing actually doing anything with the lock, checking for staleness to help with diagnostics might be a good intermediate.