Skip to content

A Ragtag Band of Changes

David Vorick requested to merge deep-testing into master

This is a work in progress. Originally this PR was aimed at creating a broad test that created multiple server testers and linked them all together. But getting the synchronization right proved to be problematic. So I started digging into ways to get things to syncrhonize, and found a pretty serious synchronizing issue with the host. Further down the rabbit hole I modified sync.ThreadGroup pretty heavily.

I'm making this PR early so that you guys can review the changes to the thread group and give your feedback. Pretty much nothing here is tested, other than that it gets used by the existing tests and they are all passing.

Highlight of the really important things and why I did them:

  1. OnStop got broken into BeforeStop and AfterStop. BeforeStop functions will be called after the thread group is closed, but before wg.Wait() is called. AfterStop will be called after wg.Wait() is called. There's a pretty clear use case. Previously, OnStop had the same functionality as BeforeStop. This of course has issues, as if you are using the thread group to close the logger, there may be functions open (not yet finished because we haven't called Wait) which depend on the logger. And on the other end of the spectrum, there is the listener closing, which to function correctly definitely needs to happen before wg.Wait.
  2. Got rid of IsStopped because it wasn't really in use, and because it interfered with some of the locking changes I made. The locking changes were necessary to get the concurrency model correct, though I haven't yet written the tests to demonstrate this. We can bring back IsStopped if it's really needed, but for the time being using the StopChan has been enough. Only the CPU miner was using it.
  3. Added the Pause and Resume functions. These are going to be very useful during testing, and are going to enable us to break free of the rigid and restrictive concurrency pattern that the modules are currently forced to follow. The biggest reason that we enforced the rigid concurrency pattern is because it was very hard to test the modules when all of them were at different heights and you had no way of knowing when they would all be done processing the most recent block. Now you can use Pause and Resume to guarantee that all chain-connected events have completed within a module.

For example, in the host, storage obligations are submitted to the transaction pool asynchronously. But, a single call connects the storage obligation submission goroutine to the goroutine that spawned it, meaning that if you call host.tg.Pause after an action which should cause the obligation submission goroutine to be spawned, you can know that the goroutine will also be finished by the time Pause stops blocking.

I predict that this will be enormously helpful during testing.

While Pause is in effect, no thread or module will be able to get access to a tg.Add call, it will block until the pause has been resumed.

Getting pause working required the addition of two more functions, AddPermanent and DonePermanent. This is because threads such as the listener threads are only spawned once, and interfere with Pause because they will not release their waitgroup lock until Stop is called and the thread is killed. We don't want to kill them during Pause, so we actually put them in a separate waitgroup. There are probably alternate ways to achieve the same goal.

That puts the current ThreadGroup implementation at 10 exported functions, which is definitely on the high side. That said, I feel pretty strongly about Pause and Resume, they are going to be very helpful for testing asynchronous actions.


Getting my simulated-full-network test working is going to require adding proper pausing to the consensus set, which was part of the inspriation for making this. But I also needed it to properly test the host.


For the rest of this PR, the host still needs some lookovers, especially with regards to the storage obligation handling and the action item handling. I want to get the ecosystem test that I wrote to be passing, and I definitely need to add proper testing to the ThreadGroup changes and to the TryLock that I created.

Following this PR I will be looking further into either the safety of the tbuilder or the dependency sending from the renter for the renter-host protocol. The renter is very likely not sending all dependent transactions, and is also very likely to need an extension to the transaction pool to be able to make the sends.

Merge request reports