Possible regression introduced by MR !849 for Windows pruning nodes only.
After reviewing the code again for !849 (merged) -- it turns out a potential
regression was introduced in that MR. If you recall, that MR decided to
not keep cs_main
held while it was reading block and/or undo data from
the disk. This is normally not a problem on non-pruning nodes since
these files never go away once they exist.
However, on pruning nodes -- they may get deleted in parallel while
the getblock
or getblockstats
RPCs run.
Now, this wouldn't even be a problem either on a unix-like system which supports unlinking files while they are open (the file only gets physically removed from the internal filesystem data structures when the last file descriptor to it is closed).
However, on Windows the above is not the case! What's worse, is that
on a Windows pruning node, the FlushStateToDisk
internal function in
validation.cpp
which does the pruning of block files may end up
getting an exception thrown if it fails to delete the block/undo file in
question (if just the right unlucky timing happens whereby a geblock
or
getblockstats
call happens to be touching a file it wants to delete
at preciely the right moment).
The problem then would be that on a Windows pruning node that somehow is
also serving up RPC getblock
and getblockstats
requests (not a
likely scenario) -- the exception thrown in FlushStateToDisk
could
cause the node to shut down unexpectedly.
I was unable to reproduce this regression here -- but reading the code I am convinced it can happen.
In summary this regression affects only a node that meets all of the following criteria:
- Windows
- Pruning node
- Serving up RPC
getblock
andgetblockstats
- Horribly unlucky timing