Flush BoltDB pages regularly (!2261) · Merge requests · NebulousLabs / Sia

Luke Champine requested to merge flushdb into master Aug 25, 2017

Since Bolt databases are mmap'd, reading them causes db pages to be paged into RAM. This memory is released to the OS when needed, but it can still be alarming for users to see Sia consuming gigabytes of memory. Typically this occurs after rescanning the blockchain (since this requires reading almost the full consensus.db), or after leaving Sia running for a while. This PR adds code to regularly flush the pages from RAM by re-mmap'ing the db file. The performance impact of this should be minimal; caching pages in RAM only helps if you read them more than once, but when rescanning the blockchain, we only read each block once.

Some stats (all performed on SSD):

	Unlock time	Unlock RSS	Sweep time	Sweep RSS
No flushing	5m 30s	2700 MB	5m 0s	2700 MB
Flushing	6m 10s	100 MB	5m 45s	100 MB

(Side note: most of the remaining 100 MB is due to muxado streams and the hostdb. We can focus on those later.)

I also identified a memory leak caused by a bug in the consensusset.Unsubscribe code. The consensus set has a field, cs.subscribers, that contains pointers to each subscriber. When unsubscribing, the code saves memory by reusing the existing slice capacity:

for i := range cs.subscribers {
	if cs.subscribers[i] == subscriber {
		cs.subscribers = append(cs.subscribers[0:i], cs.subscribers[i+1:]...)
		break
	}
}

Unfortunately, this results in a memory leak when the subscriber is at the end of the slice. Even though the slice no longer contains the subscriber, the slice is just a view of the underlying array, which does contain the pointer. If the subscriber were in the middle of the slice, it would be overwritten by the append, so there's no memory leak. But when it's at the end, it doesn't get overwritten. Initially I thought I could fix this like so:

cs.subscribers = append(cs.subscribers[0:i], cs.subscribers[i+1:]...)
cs.subscribers = cs.subscribers[:len(cs.subscribers):len(cs.subscribers)]

That is, by reducing the capacity of cs.subscribers, the subscriber "past the end" of the slice should become unreachable, and therefore be freed by the GC. But when I ran the test, this didn't happen! It turns out that Go's GC does not (currently) handle this case. The problem is that, even though we reduced the capacity of the cs.subscribers slice, there may be another slice floating around somewhere that points to the same underlying array. So the only way to free an element would be for the GC to prove that no slice is pointing to the element, and apparently this requires a prohibitive amount of liveness analysis.

Anyway, there's a simple fix, which is to set the entry in the slice to nil. This allows the element to be freed, and we don't have to allocate a new subscribers slice.

Flush BoltDB pages regularly

Merge request reports