Reduce "Cannot obtain an exclusive lease" log noise
Problem
For example, Geo uses ExclusiveLeaseGuard
in a lot of workers to do something benign, like, to deduplicate work.
Depending on usage, the logs may be spammed with error
level lines: Cannot obtain an exclusive lease. There must be another instance already in execution.
.
In these benign cases, the logging is a net negative. It's wasteful and is a common red herring for customers during troubleshooting.
Proposal
For short leases, we could probably get away without logging.
For long leases, we should log these at info
level, and the message should be more helpful, like There is already another Geo::MetricsUpdateWorker running so there is no need for this instance to continue. You can see running jobs at Admin Area > Monitoring > Background Jobs > Busy
.
For Geo's scheduler workers, we can mark them idempotent and use the :until_executed
deduplication strategy so that these lease blocks are not hit at all when another instance of the scheduler worker is running. See this closed spike MR !97438 (closed). We should investigate other Geo workers to see if the same approach can be used.
Note, the Gitlab::Geo::LogCursor::Lease
class also outputs this message to both stdout, and the logger at debug
. We should change that as well.