Skip to content

Health Loop Bug Fix

Matthew Sevey requested to merge sevey/testing into master

MERGE REQUEST

MR Guidelines

Overview

us-east-upload was experiencing a bug where the health loop was not calling bubble on directories even though the the AggregateLastHealthCheckTime was in August. On restart the health would occasionally call bubble but then quickly go to sleep again.

What I found was that with large nodes such as us-east-upload there are often a large number of pending bubbles. When the node stops, even with a clean shutdown, it ends up being in a state such that the directory metadatas are not synced when it comes to the AggregareLastHealthCheckTime. An example being the /var/.siadir file having an AggregareLastHealthCheckTime of 2020-08-21T15:13:06.887427046Z and var/skynet/.siadir file having an AggregareLastHealthCheckTime of 2020-08-21T15:15:06.887427046Z. Clearly there is still work to be done but since the AggregareLastHealthCheckTime of the sub folder var/skynet is after the current folder var the health loop quits and returns the LastHealthCheckTime of the var folder with is 2020-10-06T20:17:05.47475Z resulting in the health loop sleeping.

This change adds a condition to the loop that finds the oldest time to make sure that the directory's LastHealthCheckTime is also older than the sub directory's AggregateLastHealthCheckTime before we skip over the sub directory. Additional I have added a regression test that fails on master.

Example for Visual Changes

Checklist

Review and complete the checklist to ensure that the MR is complete before assigned to an approver.

  • All new methods or updated methods have clear docstrings
  • Testing added or updated for new methods
  • Any new packages are added to Makefile and .gitlab-ci.yml
  • API documentation updated for API updates
  • Module README.md updated for changes to workflow
  • Issue added to Sia-UI repo for new supporting features
  • Changelog File Created

Issues Closed

Merge request reports