Skip to content

Health loop batching

Matthew Sevey requested to merge sevey/health-loop-batching into master

MERGE REQUEST

MR Guidelines

Overview

This MR introduces an improvement to the health loop by batching the updates into subtrees.

Originally the health loop would use managedOldestHealthCheckTime to find the directory with the oldest LastHealthCheckTime, and then it would call bubble on that directory. Updating the skynet portals to have a filesystem structure of 2byte/2byte/2byte/file highlighted the inefficiency of this. CPU usage was consistently very high, and it was largely attributed to managedOldestHeatlhCheckTime.

By looking for a subtree instead of a directory we save CPU time and enable the use of uniqueRefreshPaths in a more efficient way to update the subtree containing the oldest LastHealthCheckTime.

This branch was tested on us-east-upload and it was observed the CPU load saw more periods of <100% CPU usage than on master. Additionally it was discovered that master was not able to update the filesystem fast enough. The target is to update the entire filesystem within the healthLoopCheck which is currently 1hr. On this branch it appears we can get within 6 days but no closer. On master we beginning to see that degrade and after running for the weekend the AggregateLastHealthCheckTime had slipped back to 7 days in the past.

Example for Visual Changes

Checklist

Review and complete the checklist to ensure that the MR is complete before assigned to an approver.

  • All new methods or updated methods have clear docstrings
  • Testing added or updated for new methods
  • Any new packages are added to Makefile and .gitlab-ci.yml
  • API documentation updated for API updates
  • Module README.md updated for changes to workflow
  • Issue added to Sia-UI repo for new supporting features
  • Changelog File Created

Issues Closed

Related to #4399 (closed)

Edited by Matthew Sevey

Merge request reports