Fix TestStackDumping test freezing

What does this MR do?

The test was freezing intermittently because there was a race condition:

  1. The test called watchForGoroutinesDump, which set up a signal handler for USR1 that will send true value to a non-blocking channel (dumpedCh) when this happens.

  2. The test then issued the USR1 signal.

  3. The test listened on dumpedCh.

If the USR1 signal were handled and sent a signal to dumpedCh before the test was ready in step 3, the test could end up waiting forever.

Fix this by making it possible to use a blocking channel to avoid this raciness.

Why was this MR needed?

Because pipelines were randomly failing, and this test took 10 minutes to fail. Example: https://gitlab.com/gitlab-org/gitlab-runner/-/jobs/8560361346

What's the best way to test this MR?

I couldn't reproduce locally, but I was able to demonstrate that this race condition can happen:

diff --git a/log/dump_unix.go b/log/dump_unix.go
index a6da24256..818695d21 100644
--- a/log/dump_unix.go
+++ b/log/dump_unix.go
@@ -3,6 +3,7 @@
 package log

 import (
+        "fmt"
        "os"
        "os/signal"
        "runtime"
@@ -23,11 +24,13 @@ func watchForGoroutinesDump(logger *logrus.Logger, stopCh chan bool) (chan bool,
                for {
                        select {
                        case <-dumpStacksCh:
+                               fmt.Println("dumping stack now")
                                buf := make([]byte, 1<<20)
                                len := runtime.Stack(buf, true)
                                logger.Printf("=== received SIGUSR1 ===\n*** goroutine dump...\n%s\n*** end\n", buf[0:len])

                                nonBlockingSend(dumpedCh, true)
+                               fmt.Println("sent signal now")
                        case <-stopCh:
                                close(finishedCh)
                                return
diff --git a/log/dump_unix_test.go b/log/dump_unix_test.go
index e108e283b..82fdd18bd 100644
--- a/log/dump_unix_test.go
+++ b/log/dump_unix_test.go
@@ -3,9 +3,11 @@
 package log

 import (
+        "fmt"
        "os"
        "syscall"
        "testing"
+       "time"

        "github.com/sirupsen/logrus"
        "github.com/sirupsen/logrus/hooks/test"
@@ -27,7 +29,10 @@ func TestStackDumping(t *testing.T) {
        require.NoError(t, err)
        require.NoError(t, proc.Signal(syscall.SIGUSR1))

+       fmt.Println("sleeping")
+       time.Sleep(10 * time.Second)
        <-dumpedCh
+       fmt.Println("received dumped channel now")
        logrusOutput, err := hook.LastEntry().String()
        require.NoError(t, err)
        assert.Contains(t, logrusOutput, "=== received SIGUSR1 ===")

What are the relevant issue numbers?

Relates to #38325 (closed)

Edited by Stan Hu

Merge request reports

Loading