Health loop batching
MERGE REQUEST
Overview
This MR introduces an improvement to the health loop by batching the updates into subtrees.
Originally the health loop would use managedOldestHealthCheckTime
to find the directory with the oldest LastHealthCheckTime
, and then it would call bubble on that directory. Updating the skynet portals to have a filesystem structure of 2byte/2byte/2byte/file
highlighted the inefficiency of this. CPU usage was consistently very high, and it was largely attributed to managedOldestHeatlhCheckTime
.
By looking for a subtree instead of a directory we save CPU time and enable the use of uniqueRefreshPaths
in a more efficient way to update the subtree containing the oldest LastHealthCheckTime
.
This branch was tested on us-east-upload
and it was observed the CPU load saw more periods of <100% CPU usage than on master. Additionally it was discovered that master was not able to update the filesystem fast enough. The target is to update the entire filesystem within the healthLoopCheck
which is currently 1hr. On this branch it appears we can get within 6 days but no closer. On master we beginning to see that degrade and after running for the weekend the AggregateLastHealthCheckTime
had slipped back to 7 days in the past.
Example for Visual Changes
Checklist
Review and complete the checklist to ensure that the MR is complete before assigned to an approver.
-
All new methods or updated methods have clear docstrings -
Testing added or updated for new methods -
Any new packages are added to Makefile and .gitlab-ci.yml -
API documentation updated for API updates -
Module README.md updated for changes to workflow -
Issue added to Sia-UI repo for new supporting features -
Changelog File Created
Issues Closed
Related to #4399 (closed)