[Programming Task] Health and Stuck Loop Subsytem Refactor
PROGRAMMING TASK
Description of Task
Refactor the health and stuck loops into their own subsystems.
Reason or Need for Change
Improve clarity of code and renter subsystems to reduce hidden complexity.
Design / Proposal
TBD
Follows up made from MRs:
The following discussion from !3563 (merged) should be addressed:
-
@DavidVorick started a discussion: (+1 comment) This is another thing that can be broken out as a complexity / assumption: the repair loop will sleep and wait to be interrupted by new work, which means that new work being added will have to explicitly wake the repair loop up.
Currently we wait on multiple different channels to wake the repair loop, but if the effect for all of them is the same, maybe it's actually sufficient to consolidate them into a single channel. We can move the logging statements to the callsite pretty painlessly I believe.
The following discussion from !3807 (merged) should be addressed:
-
@MSevey started a discussion: Creating discussion to be resolved into follow up issue for Cleaning up stuck loop into its own subsystem as well as refactoring the code within
threadedStuckFileLoop
Health Loop and Stuck Loop should be split into their own subsystems.
The following discussion from !3807 (merged) should be addressed:
-
@MSevey started a discussion: (+2 comments) One question I had was if it makes sense to move
stuckChunkFound
andstuckChunkSuccess
to thestuckQueue
now that we have one. That would help keep the stuck loop and the repair/upload loop controls separate.