Skip to content

Fix race blocking goroutine in shell executor

Steve Xuereb requested to merge fix/executor-shell-blocking-goroutine into master

What does this MR do?

Fix race blocking goroutine in shell executor

Why was this MR needed?

A goroutine writes to waitCh and a select block reads from the channel. The select block is also reading from the context.Done. It might be the case that the waitCh write will block forever because ctx.Done returns and no one is reading from waitCh.

This is discussed in detail in https://songlh.github.io/paper/gcatch.pdf 1. Introduction:

A previously unknown concurrency bug in Docker is shown in Figure 1. Function Exec() creates a child goroutine at line 5 to duplicate the content of a.Reader. After the duplication, the child goroutine sends err to the parent goroutine through channel outDone to notify the parent about completion and any possible error (line 7). Since outDone is an unbuffered channel (line 3), the child blocks at line 7 until the parent receives from outDone. Meanwhile, the parent blocks at the select at line 9 until it either receives err from the child (line 10) or receives a message from ctx.Done() (line 13), indicating the entire task can be halted. If the message from ctx.Done() arrives earlier, or if the two messages arrive concurrently and Go’s runtime non-deterministically chooses the second case to execute, the parent will return from function Exec(). No other goroutine can pull messages from outDone, leaving the child goroutine permanently blocked at line 7.

What's the best way to test this MR?

N/A since this is mostly a race condition.

What are the relevant issue numbers?

#27892 (closed)

Edited by Steve Xuereb

Merge request reports