Skip to content

Fix TestBuildCancel from timing out

Steve Xuereb requested to merge 27077-testbuildcancel-hangs into master

What does this MR do?

Fix TestBuildCancel from timing out

Why was this MR needed?

In #27077 (closed) we are seeing timeouts of the TestBuildCancel test because when the job in canceled we send SIGTERM to process that is running. After a certain amount of time we send SIGKILL if that process hasn't terminated yet, the default is 10 minutes. Since the default timeout of go tests is 10 minutes our test ends up timing out because SIGKILL hasn't been sent yet.

Have the GracefulKillTimeout and ForceKillTimeout configurable and set it at a low amount for tests so we don't waste time waiting for a the process to terminate since the processes don't usually have any cleanup to do.

What's the best way to test this MR?

Try to mimic the same thing that is happening inside of CI where a process is not terminated for example:

  1. Create a new binary from the following go program which takes the SIGTERM signal and just waits for 1 minute before exiting. Put it in /tmp/listener.

  2. Update GetRemoteLongRunningBuild to call that instead of the sleep command:

    diff --git a/common/support.go b/common/support.go
    index 73d30d160..47dfe6b64 100644
    --- a/common/support.go
    +++ b/common/support.go
    @@ -220,7 +220,7 @@ func GetLongRunningBuild() (JobResponse, error) {
     }
    
     func GetRemoteLongRunningBuild() (JobResponse, error) {
    -       return GetRemoteBuildResponse("sleep 3600")
    +       return GetRemoteBuildResponse("/tmp/siglistener")
     }
    
     func GetRemoteLongRunningBuildCMD() (JobResponse, error) {
  3. Run the test go test -timeout=45s -v -run TestBuildCancel/pwsh/job_is_canceling ./executors/shell/ it will not timeout.

  4. Remove the GracefulTimeout override for tests to get the timeout

    diff --git a/executors/shell/shell_integration_test.go b/executors/shell/shell_integration_test.go
    index 111cc2ac3..3309f2956 100644
    --- a/executors/shell/shell_integration_test.go
    +++ b/executors/shell/shell_integration_test.go
    @@ -94,11 +94,11 @@ func newBuild(t *testing.T, getBuildResponse common.JobResponse, shell string) (
                    JobResponse: getBuildResponse,
                    Runner: &common.RunnerConfig{
                            RunnerSettings: common.RunnerSettings{
    -                               BuildsDir:           dir,
    -                               Executor:            "shell",
    -                               Shell:               shell,
    -                               GracefulKillTimeout: func(i int) *int { return &i }(5),
    -                               ForceKillTimeout:    func(i int) *int { return &i }(1),
    +                               BuildsDir: dir,
    +                               Executor:  "shell",
    +                               Shell:     shell,
    +                               // GracefulKillTimeout: func(i int) *int { return &i }(5),
    +                               ForceKillTimeout: func(i int) *int { return &i }(1),
                            },
                    },
                    SystemInterrupt: make(chan os.Signal, 1),
  5. Run the test go test -timeout=45s -v -run TestBuildCancel/pwsh/job_is_canceling ./executors/shell/ it will timeout.

What are the relevant issue numbers?

Closes #27077 (closed)

Merge request reports