Matthew Sevey requested to merge sevey/health-loop-batching into master Nov 02, 2020

MERGE REQUEST

Overview

This MR introduces an improvement to the health loop by batching the updates into subtrees.

Originally the health loop would use managedOldestHealthCheckTime to find the directory with the oldest LastHealthCheckTime, and then it would call bubble on that directory. Updating the skynet portals to have a filesystem structure of 2byte/2byte/2byte/file highlighted the inefficiency of this. CPU usage was consistently very high, and it was largely attributed to managedOldestHeatlhCheckTime.

By looking for a subtree instead of a directory we save CPU time and enable the use of uniqueRefreshPaths in a more efficient way to update the subtree containing the oldest LastHealthCheckTime.

This branch was tested on us-east-upload and it was observed the CPU load saw more periods of <100% CPU usage than on master. Additionally it was discovered that master was not able to update the filesystem fast enough. The target is to update the entire filesystem within the healthLoopCheck which is currently 1hr. On this branch it appears we can get within 6 days but no closer. On master we beginning to see that degrade and after running for the weekend the AggregateLastHealthCheckTime had slipped back to 7 days in the past.

Example for Visual Changes

Checklist

Review and complete the checklist to ensure that the MR is complete before assigned to an approver.

All new methods or updated methods have clear docstrings
Testing added or updated for new methods
Any new packages are added to Makefile and .gitlab-ci.yml
API documentation updated for API updates
Module README.md updated for changes to workflow
Issue added to Sia-UI repo for new supporting features
Changelog File Created

Issues Closed

Related to #4399 (closed)

Edited Nov 02, 2020 by Matthew Sevey

Health loop batching

MERGE REQUEST

Overview

Example for Visual Changes

Checklist

Issues Closed

Merge request reports