Change update interval of runners when trying to preserve contacted_at
What does this MR do?
Changes how often we update ci_runners
table. This change makes it to update 60 times less.
Are there points in the code the reviewer needs to double check?
Maybe we could move contacted_at
out of database and store this data in persistent storage of Redis.
This probably be way more efficient then ci_runners
.
Why was this MR needed?
We see a large amount of vacuuming on ci_runners
table. We constantly update only this value, so this seems to be reason why it happens.
Drawbacks
We have a stuck
runners feature.
It will make this feature less usable, because before we discover that runner is dead it will have to pass 2 hours.
Does this MR meet the acceptance criteria?
-
CHANGELOG entry added -
Documentation created/updated -
API support added - Tests
-
Added for this feature/bug -
All builds are passing
-
-
Conform by the style guides -
Branch has no merge conflicts with master
(if you do - rebase it please) -
Squashed related commits together
What are the relevant issue numbers?
Merge request reports
Activity
mentioned in issue #21698 (closed)
Added 315 commits:
-
03d9631c...1d548869 - 314 commits from branch
master
- 6a29ac7d - Change update interval of runners when trying to preserve contacted_at
-
03d9631c...1d548869 - 314 commits from branch
Reassigned to @yorickpeterse
Milestone changed to %8.12
@ayufan Redis may also be an option, but let's start with this and see how things work out.
mentioned in commit 575a9747
Marked the task CHANGELOG entry added as completed
Marked the task Documentation created/updated as completed
Marked the task Conform by the style guides as completed
Marked the task Squashed related commits together as completed
Mentioned in commit muteor/gitlab-ce@575a9747
Mentioned in merge request gitlab-com/www-gitlab-com!3202 (merged)
Mentioned in issue #21277 (closed)
This change has a pretty big negative impact on
/api/v4/runners/:id
The
contacted_at
field appears to only be updated about once per hour (probably it's been reduced again from the2h
of this MR).Can we get
contacted_at
moved somewhere more performant so it could be updated every1m
without causing performance issues?#38265 (comment 55320316) made a proposal to this effect.
This would be very valuable from an infrastructure monitoring perspective. Waiting
~1h
to be alerted that a runner went down (or worse, waiting for builds to fail as a result if it was an important runner) is a very bad user experience from the system/network administration perspective.Thanks in advance for any advice on how to resolve this issue.
@ayufan Any feedback on this comment? Related issue. gitlab#258809
I believe that we might have a ~bug there: gitlab#258809 (comment 428843971).
mentioned in issue gitlab#258809