Cannot obtain an exclusive lease for ci/pipeline_processing/atomic_processing_service::pipeline_id:xxxxx
Summary
GitLab generates the following error, and it should not generate it.
Cannot obtain an exclusive lease for
ci/pipeline_processing/atomic_processing_service::pipeline_id:79707.
There must be another instance already in execution.
It has been investigated several times as the potential root cause of a GitLab issue, and in all cases this has only delayed resolution of the issue.
Ignore this error. This bug issue has been raised to request that the product stop generating this error.
Additional details
During a customer emergency call, one of the log entries which we acted upon was:
{
"severity": "ERROR",
"time": "2021-05-06T08:09:46.007Z",
"correlation_id": "01F50BKFPPTX822RCGGAB7V37B",
"message": "Cannot obtain an exclusive lease for ci/pipeline_processing/atomic_processing_service::pipeline_id:79707. There must be another instance already in execution."
}
These appear in application.log
/ application_json.log
Merge requests were failing, and I thought this could be related, as MRs need to know pipeline status.
We shut down Rails and Sidekiq and ran:
sudo gitlab-rake gitlab:exclusive_lease:clear
This wasn't the cause of the issue, which was completely unrelated to Category:Continuous Integration. Post-emergency, the customer's instance is working OK, with no report of issues with pipelines, but the log entries continue.
Purpose of this issue is to establish if there is an issue to address within ~"group::continuous integration"
GitLab team members can read more in: the ticket relating to the emergency - which was related to an upgrade from %13.7 to %13.11 and some other customers' tickets (links provided for GitLab team members) and messages where we found the same errors:
- Ticket relating to a gitlab-ci.yml issue (GitLab %13.9) the lease log entries needed to be eliminated from the investigation
- Ticket relating to artifacts on %13.10 - the customer saw the lease log entries and queried them.
- Ticket relating to performance and S3 objects %13.11
- Ticket relating to errors fetching pipelines in UI %13.11 - customer found these lease log entries
- A ticket relating to Jira %13.10
- A 13.7 instance - restarting parts of their GitLab fixed a problem with MRs.
- Another 13.7 instance - Gitaly issues - the investigation centred on these lease issues for some time.
- gitlab-org/charts/gitlab#2513 (comment 484744582) mentions this log entry.
- Forum entry
- It also occurs on gitlab.com, see logs link below.
To try and track down when it might have started
- Ticket for a 13.5 instance, and another 13.5
related issues
This customer is running Geo, and so #212756 (closed) looks tempting, however that is for Geo::MetricsUpdateWorker
, not for i/pipeline_processing/atomic_processing_service
.
There's also a problem with similar errors on gitlab.com, raised in #326030 (closed). However, the log entry is different (it doesn't include the entity with the lease conflict) and cross checking the correlation ID, I see AuthorizedProjectsWorker
. This seems consistent with the analysis on the issue.
Steps to reproduce
Unknown
Example Project
n/a - doesn't seem project specific.
What is the current bug behavior?
This lease log entry doesn't appear to adversely affect the product, but similarly to the similar Geo example #212756 (closed) as it's flagged ERROR both customers and support engineers have focused on it during multiple investigations.
Resolution of the outage that triggered this customer's emergency, and a number of other investigations, have been delayed by investigating this error.
What is the expected correct behavior?
This lease log entries is not produced.
Relevant logs and/or screenshots
See above.
Output of checks
This happens on gitlab.com
https://log.gprd.gitlab.net/goto/0e7b90ccee2a1a3ea83164a51b4c2253