Health Loop Bug Fix
MERGE REQUEST
Overview
us-east-upload
was experiencing a bug where the health loop was not calling bubble on directories even though the the AggregateLastHealthCheckTime
was in August. On restart the health would occasionally call bubble but then quickly go to sleep again.
What I found was that with large nodes such as us-east-upload
there are often a large number of pending bubbles. When the node stops, even with a clean shutdown, it ends up being in a state such that the directory metadatas are not synced when it comes to the AggregareLastHealthCheckTime
. An example being the /var/.siadir
file having an AggregareLastHealthCheckTime
of 2020-08-21T15:13:06.887427046Z
and var/skynet/.siadir
file having an AggregareLastHealthCheckTime
of 2020-08-21T15:15:06.887427046Z
. Clearly there is still work to be done but since the AggregareLastHealthCheckTime
of the sub folder var/skynet
is after the current folder var
the health loop quits and returns the LastHealthCheckTime
of the var
folder with is 2020-10-06T20:17:05.47475Z
resulting in the health loop sleeping.
This change adds a condition to the loop that finds the oldest time to make sure that the directory's LastHealthCheckTime
is also older than the sub directory's AggregateLastHealthCheckTime
before we skip over the sub directory. Additional I have added a regression test that fails on master.
Example for Visual Changes
Checklist
Review and complete the checklist to ensure that the MR is complete before assigned to an approver.
-
All new methods or updated methods have clear docstrings -
Testing added or updated for new methods -
Any new packages are added to Makefile and .gitlab-ci.yml -
API documentation updated for API updates -
Module README.md updated for changes to workflow -
Issue added to Sia-UI repo for new supporting features -
Changelog File Created