Skip to content

Proper offline checking in HostDB

Luke Champine requested to merge online into master

Previously, the renter determined whether a host was offline via the isOffline function shown in the diff below, which relied on the ActiveHosts and AllHosts methods. This function depended upon the assumption that hosts would not be added to AllHosts until they had been scanned. To accommodate this assumption, that behavior was implemented in 16479f36.

However, this change came with a hidden cost: 126,000 extra goroutines at startup. When the HostDB scans the blockchain, it calls insertHost on every host announcement seen. If the host is already in allHosts, nothing is done; but if not, it is added to the scan pool. And since we want to avoid blocking, this is done in a goroutine, and on a buffered channel (scanPool). But scanning a host is a lot slower than loading the blockchain; many thousands of host announcements might be scanned in the time it takes to scan a single host. So what happened was something like this: the HostDB would see a host announcement, check if it was in allHosts (no), and then add it to the scan pool. Then it would process another announcement with the same address, and check allHosts again. But the scan on the original announcement would not have completed yet, so it wouldn't be in allHosts, and as a result the HostDB would add the new announcement to the scan pool as well. And since scanPool was only buffered to 1000 entries, all of those goroutines would be blocked, causing considerable strain on the scheduler.

This PR reverts this behavior, so that we add hosts to allHosts prior to the scan. Now that we can perform proper deduplication, we wind up scanning only 700 announcements instead of 126,000. These fit within the buffer of scanPool, so there should be no blocked goroutines at all, at least until the network grows to >1000 hosts.

As a result, it was necessary to implement a proper IsOffline method on the HostDB, since we could no longer depend on the assumption that hosts in AllHosts had been scanned. Fortunately this was a pretty easy fix. It didn't even break any HostDB tests...though this is mostly because the HostDB testing is rather sparse. Some of the new tests in #923 will be broken by this change.

Merge request reports