Runner unregistered by auth token still in UI and taking jobs
Summary
Removing a runner via auth token gives successful unregister message, but the runner remains in the GitLab UI and still picks up jobs.
Steps to reproduce
- Stood up GL Linux package 16.0.5
- Stood up runner Linux package 16.0.0
- Registered runner via auth token
- Confirmed runner was online, ran a job, job ran
- Removed runner with auth token - Runner still shows in refreshed runner page 9:40 am
- Issue persists at 9:50. Runner shows as having just been in contact.
- Restarted runner with
sudo gitlab-runner restart
- unregistered runner persists inAdmin > CI/CD > Runners
- Attempted to start CI job; job runs
Repeated steps with GitLab Linux package 16.2 and runner 16.2. Issue persists in latest.
Example Project
I have taken down my instances to stand up newer version of GitLab to see if the issue persists. I have stood up 16.2 to test, but I'm currently hitting technical difficulties.
What is the current bug behavior?
Runner removed via auth token with the gitlab-runner unregister
command is not removed from the UI and still accepts jobs.
What is the expected correct behavior?
The runner should be deactivated and removed.
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)
Possible fixes
(@pedropombeiro
): We could make Ci::Runner.offline
scope return either:
- runners that have been contacted more than 2 hours away;
- or runners that if contacted recently, don't have any runner managers associated.
The challenge here will be with query performance, since we'll be working with table unions that will require deduplication.
Proposed patch (`offline` still doesn't deduplicate results)
diff --git a/app/models/ci/runner.rb b/app/models/ci/runner.rb
index 0413bb480d40..815f0e43c6e0 100644
--- a/app/models/ci/runner.rb
+++ b/app/models/ci/runner.rb
@@ -87,7 +87,13 @@ class Runner < Ci::ApplicationRecord
scope :active, -> (value = true) { where(active: value) }
scope :paused, -> { active(false) }
- scope :online, -> { where(arel_table[:contacted_at].gt(online_contact_time_deadline)) }
+ scope :online, -> do
+ where(arel_table[:contacted_at].gt(online_contact_time_deadline))
+ .where('EXISTS(?)',
+ RunnerManager.select(1)
+ .where(RunnerManager.arel_table[:runner_id].eq(arel_table[:id]))
+ .limit(1))
+ end
scope :recent, -> do
timestamp = stale_deadline
@@ -99,7 +105,11 @@ class Runner < Ci::ApplicationRecord
where(arel_table[:created_at].lteq(timestamp))
.where(arel_table[:contacted_at].eq(nil).or(arel_table[:contacted_at].lteq(timestamp)))
end
- scope :offline, -> { where(arel_table[:contacted_at].lteq(online_contact_time_deadline)) }
+ scope :offline, -> do
+ left_joins(:runner_managers)
+ .where(arel_table[:contacted_at].lteq(online_contact_time_deadline))
+ .or(where(ci_runner_machines: { runner_id: nil }))
+ end
scope :never_contacted, -> { where(contacted_at: nil) }
scope :ordered, -> { order(id: :desc) }
@@ -345,7 +355,7 @@ def display_name
end
def online?
- contacted_at && contacted_at > self.class.online_contact_time_deadline
+ contacted_at && contacted_at > self.class.online_contact_time_deadline && runner_managers.any?
end
def stale?