Fix TestStackDumping test freezing
What does this MR do?
The test was freezing intermittently because there was a race condition:
-
The test called
watchForGoroutinesDump, which set up a signal handler forUSR1that will sendtruevalue to a non-blocking channel (dumpedCh) when this happens. -
The test then issued the
USR1signal. -
The test listened on
dumpedCh.
If the USR1 signal were handled and sent a signal to dumpedCh
before the test was ready in step 3, the test could end up waiting
forever.
Fix this by making it possible to use a blocking channel to avoid this raciness.
Why was this MR needed?
Because pipelines were randomly failing, and this test took 10 minutes to fail. Example: https://gitlab.com/gitlab-org/gitlab-runner/-/jobs/8560361346
What's the best way to test this MR?
I couldn't reproduce locally, but I was able to demonstrate that this race condition can happen:
diff --git a/log/dump_unix.go b/log/dump_unix.go
index a6da24256..818695d21 100644
--- a/log/dump_unix.go
+++ b/log/dump_unix.go
@@ -3,6 +3,7 @@
package log
import (
+ "fmt"
"os"
"os/signal"
"runtime"
@@ -23,11 +24,13 @@ func watchForGoroutinesDump(logger *logrus.Logger, stopCh chan bool) (chan bool,
for {
select {
case <-dumpStacksCh:
+ fmt.Println("dumping stack now")
buf := make([]byte, 1<<20)
len := runtime.Stack(buf, true)
logger.Printf("=== received SIGUSR1 ===\n*** goroutine dump...\n%s\n*** end\n", buf[0:len])
nonBlockingSend(dumpedCh, true)
+ fmt.Println("sent signal now")
case <-stopCh:
close(finishedCh)
return
diff --git a/log/dump_unix_test.go b/log/dump_unix_test.go
index e108e283b..82fdd18bd 100644
--- a/log/dump_unix_test.go
+++ b/log/dump_unix_test.go
@@ -3,9 +3,11 @@
package log
import (
+ "fmt"
"os"
"syscall"
"testing"
+ "time"
"github.com/sirupsen/logrus"
"github.com/sirupsen/logrus/hooks/test"
@@ -27,7 +29,10 @@ func TestStackDumping(t *testing.T) {
require.NoError(t, err)
require.NoError(t, proc.Signal(syscall.SIGUSR1))
+ fmt.Println("sleeping")
+ time.Sleep(10 * time.Second)
<-dumpedCh
+ fmt.Println("received dumped channel now")
logrusOutput, err := hook.LastEntry().String()
require.NoError(t, err)
assert.Contains(t, logrusOutput, "=== received SIGUSR1 ===")
What are the relevant issue numbers?
Relates to #38325 (closed)