Race condition: context cancellation vs exec done
Tracing down the source of some flaky test failures in my application (https://github.com/6RiverSystems/mmmbbb/), I discovered a race condition in the implementation of exec and how it handles a context which is canceled just after the statement completes.
The code causing the bug is the goroutine listening for context cancellation in exec: https://gitlab.com/cznic/sqlite/-/blob/master/sqlite.go#L497-502
The timeline of what happens:
- Statement 1 starts (exec is called) with context 1
- Statement 1 completes
- Context 1 is canceled
- Statement 2 starts with context 2
- The "listener" from Statement 1 sees both context 1 done and
donechclosed. Go randomly picks acaseto fire, and sometimes picks the "context done" case. This interrupts SQLite - Statement 2 is erroneously terminated with the
SQLITE_INTERRUPTerror code, even though its context is not canceled
I was able to confirm this is almost certainly the cause of my issues by adding this panic case to the code:
diff --git a/sqlite.go b/sqlite.go
index a484a2d..591aff8 100644
--- a/sqlite.go
+++ b/sqlite.go
@@ -496,6 +496,9 @@ func (s *stmt) exec(ctx context.Context, args []driver.NamedValue) (r driver.Res
go func() {
select {
case <-ctx.Done():
+ if _, ok := <-donech; !ok {
+ panic(fmt.Errorf("raced with donech"))
+ }
atomic.AddInt32(&done, 1)
s.c.interrupt(s.c.db)
case <-donech:
The panic triggered quite reliably for me.
Of course this panic code can easily be replaced by better code that will instead suppress the interrupt. Another option to fix this would be to also inc &done in the defer that closes donech and a few other key places, so that we are (more) assured we won't call interrupt if the current exec is not going to call into SQLite again.