Flaky data race in TestBuildCancel for custom executor tests
Overview
stack trace
=== RUN TestBuildCancel/bash
Running with gitlab-runner development version (HEAD)
Preparing the "custom" executor
Custom Executor binary - "config" stage
Mocking execution of: []
Using Custom executor...
Custom Executor binary - "prepare" stage
Mocking execution of: []
PREPARE doesn't accept any arguments. It just does its job
Preparing environment
Custom Executor binary - "run" stage
Mocking execution of: [/tmp/custom-executor639724443/script820094014/script. prepare_script]
RUN accepts two arguments: the path to the script to execute and the stage of the job
Executing: &exec.Cmd{Path:"/bin/bash", Args:[]string{"bash", "/tmp/custom-executor639724443/script820094014/script."}, Env:[]string(nil), Dir:"", Stdin:io.Reader(nil), Stdout:(*bytes.Buffer)(0xc00007c180), Stderr:(*bytes.Buffer)(0xc00007c180), ExtraFiles:[]*os.File(nil), SysProcAttr:(*syscall.SysProcAttr)(nil), Process:(*os.Process)(nil), ProcessState:(*os.ProcessState)(nil), ctx:context.Context(nil), lookPathErr:error(nil), finished:false, childFiles:[]*os.File(nil), closeAfterStart:[]io.Closer(nil), closeAfterWait:[]io.Closer(nil), goroutine:[]func() error(nil), errch:(chan error)(nil), waitDone:(chan struct {})(nil)}
>>>>>>>>>>
Running on runner-pvr9xbdq-project-250833-concurrent-0...
<<<<<<<<<<
Getting source from Git repository
Custom Executor binary - "run" stage
Mocking execution of: [/tmp/custom-executor639724443/script381370501/script. get_sources]
RUN accepts two arguments: the path to the script to execute and the stage of the job
Executing: &exec.Cmd{Path:"/bin/bash", Args:[]string{"bash", "/tmp/custom-executor639724443/script381370501/script."}, Env:[]string(nil), Dir:"", Stdin:io.Reader(nil), Stdout:(*bytes.Buffer)(0xc00007c180), Stderr:(*bytes.Buffer)(0xc00007c180), ExtraFiles:[]*os.File(nil), SysProcAttr:(*syscall.SysProcAttr)(nil), Process:(*os.Process)(nil), ProcessState:(*os.ProcessState)(nil), ctx:context.Context(nil), lookPathErr:error(nil), finished:false, childFiles:[]*os.File(nil), closeAfterStart:[]io.Closer(nil), closeAfterWait:[]io.Closer(nil), goroutine:[]func() error(nil), errch:(chan error)(nil), waitDone:(chan struct {})(nil)}
>>>>>>>>>>
Fetching changes...
Initialized empty Git repository in /tmp/gitlab-runner-custom-executor-test216545836/builds/project-0/.git/
Created fresh repository.
Checking out 91956efe as master...
Skipping Git submodules setup
<<<<<<<<<<
Executing "step_script" stage of the job script
Custom Executor binary - "run" stage
Mocking execution of: [/tmp/custom-executor639724443/script285765152/script. step_script]
RUN accepts two arguments: the path to the script to execute and the stage of the job
Executing: &exec.Cmd{Path:"/bin/bash", Args:[]string{"bash", "/tmp/custom-executor639724443/script285765152/script."}, Env:[]string(nil), Dir:"", Stdin:io.Reader(nil), Stdout:(*bytes.Buffer)(0xc00007c180), Stderr:(*bytes.Buffer)(0xc00007c180), ExtraFiles:[]*os.File(nil), SysProcAttr:(*syscall.SysProcAttr)(nil), Process:(*os.Process)(nil), ProcessState:(*os.ProcessState)(nil), ctx:context.Context(nil), lookPathErr:error(nil), finished:false, childFiles:[]*os.File(nil), closeAfterStart:[]io.Closer(nil), closeAfterWait:[]io.Closer(nil), goroutine:[]func() error(nil), errch:(chan error)(nil), waitDone:(chan struct {})(nil)}
==================
WARNING: DATA RACE
Read at 0x00c0000ca170 by goroutine 56:
gitlab.com/gitlab-org/gitlab-runner/executors/custom_test.TestBuildCancel.func1.1()
/builds/gitlab-org/gitlab-runner/executors/custom/integration_test.go:260 +0xa0
Previous write at 0x00c0000ca170 by goroutine 59:
gitlab.com/gitlab-org/gitlab-runner/common.(*Trace).SetCancelFunc()
/builds/gitlab-org/gitlab-runner/common/trace.go:36 +0x43
gitlab.com/gitlab-org/gitlab-runner/common.(*Build).Run()
/builds/gitlab-org/gitlab-runner/common/build.go:713 +0x543
gitlab.com/gitlab-org/gitlab-runner/common/buildtest.RunBuildWithOptions()
/builds/gitlab-org/gitlab-runner/common/buildtest/test.go:34 +0x12b
gitlab.com/gitlab-org/gitlab-runner/executors/custom_test.TestBuildCancel.func1()
/builds/gitlab-org/gitlab-runner/common/buildtest/test.go:24 +0x4c1
gitlab.com/gitlab-org/gitlab-runner/shells/shellstest.OnEachShell.func1()
/builds/gitlab-org/gitlab-runner/shells/shellstest/utils.go:23 +0xd9
testing.tRunner()
/usr/local/go/src/testing/testing.go:909 +0x199
Goroutine 56 (running) created at:
time.goFunc()
/usr/local/go/src/time/sleep.go:168 +0x51
Goroutine 59 (running) created at:
testing.(*T).Run()
/usr/local/go/src/testing/testing.go:960 +0x651
gitlab.com/gitlab-org/gitlab-runner/shells/shellstest.OnEachShell()
/builds/gitlab-org/gitlab-runner/shells/shellstest/utils.go:18 +0x1b9
gitlab.com/gitlab-org/gitlab-runner/executors/custom_test.TestBuildCancel()
/builds/gitlab-org/gitlab-runner/executors/custom/integration_test.go:249 +0x44
testing.tRunner()
/usr/local/go/src/testing/testing.go:909 +0x199
==================
==================
WARNING: DATA RACE
Write at 0x00c0000dc1f0 by goroutine 56:
sync/atomic.CompareAndSwapInt32()
/usr/local/go/src/runtime/race_amd64.s:293 +0xb
sync.(*Mutex).Lock()
/usr/local/go/src/sync/mutex.go:74 +0x49
context.(*cancelCtx).cancel()
/usr/local/go/src/context/context.go:362 +0x6c
context.(*timerCtx).cancel()
/usr/local/go/src/context/context.go:440 +0x57
context.WithDeadline.func3()
/usr/local/go/src/context/context.go:416 +0x78
gitlab.com/gitlab-org/gitlab-runner/executors/custom_test.TestBuildCancel.func1.1()
/builds/gitlab-org/gitlab-runner/executors/custom/integration_test.go:260 +0xae
Previous write at 0x00c0000dc1f0 by goroutine 59:
context.WithDeadline()
/usr/local/go/src/context/context.go:401 +0x12e
context.WithTimeout()
/usr/local/go/src/context/context.go:464 +0x78
gitlab.com/gitlab-org/gitlab-runner/common.(*Build).Run()
/builds/gitlab-org/gitlab-runner/common/build.go:710 +0x4d2
gitlab.com/gitlab-org/gitlab-runner/common/buildtest.RunBuildWithOptions()
/builds/gitlab-org/gitlab-runner/common/buildtest/test.go:34 +0x12b
gitlab.com/gitlab-org/gitlab-runner/executors/custom_test.TestBuildCancel.func1()
/builds/gitlab-org/gitlab-runner/common/buildtest/test.go:24 +0x4c1
gitlab.com/gitlab-org/gitlab-runner/shells/shellstest.OnEachShell.func1()
/builds/gitlab-org/gitlab-runner/shells/shellstest/utils.go:23 +0xd9
testing.tRunner()
/usr/local/go/src/testing/testing.go:909 +0x199
Goroutine 56 (running) created at:
time.goFunc()
/usr/local/go/src/time/sleep.go:168 +0x51
Goroutine 59 (running) created at:
testing.(*T).Run()
/usr/local/go/src/testing/testing.go:960 +0x651
gitlab.com/gitlab-org/gitlab-runner/shells/shellstest.OnEachShell()
/builds/gitlab-org/gitlab-runner/shells/shellstest/utils.go:18 +0x1b9
gitlab.com/gitlab-org/gitlab-runner/executors/custom_test.TestBuildCancel()
/builds/gitlab-org/gitlab-runner/executors/custom/integration_test.go:249 +0x44
testing.tRunner()
/usr/local/go/src/testing/testing.go:909 +0x199
==================
Proposal
The problem here is that we call cancel after 2 seconds no matter what. As we can see in the stack trace the problem is because we call cancelFunc whilst is gets set up in build.Run.
Instead of sleeping we should look at the build state however we would need to protect this with a mutex to make sure we don't have any data races on read/writing to that field, having 2 functions 1 to set and 1 to get the value and locking the mutex might be the best approach.
Edited by Steve Xuereb