Draft: add an 'fopen' wrapper for setting close-on-exec and demonstrate its use
This is a proposal for something we could do tree-wide to every fopen
call in lib/ and plugin/. If it looks reasonable, I’ll update this MR with use of this new function in all those places.
When a program performs fork
+exec
,¹ the child process by default inherits
any open file descriptors. This is usually undesirable because the purpose of
exec
is usually to run a separate program that should not have the ability to
operate on the parent’s open files. It is generally recommended that library
code always set the “close-on-exec” flag on file descriptors it creates, because
it does not know the intentions of its caller.²
Graphviz is generally not doing this recommended action anywhere. This change begins a series of steps to conform to this practice.
In the case of this initial usage, in config_rescan
, the file handle is opened
and closed within the scope of a single function. So it may not be immediately
clear why setting close-on-exec is still recommended. The reason is that another
thread may be calling fork
+exec
. Although the file handle appears to not
outlive config_rescan
, any child process created while the file is open will
get a descriptor to this open file. This code cannot know if the rest of the
application it is hosted within is multithreaded or not. In fact, because fork
is signal-safe, this problem can occur even in a single-threaded process with
signal handlers. This race is rare but leads to very subtle bugs that are hard
to diagnose.³
On top of the above, there are a few additional subtleties:
-
Using
fcntl
to set close-on-exec on a file descriptor you have just created is insufficient. There is a window in between when you create the file descriptor and when you set close-on-exec that another thread can runfork
+exec
. To address this, most platforms have an extra 'e' option tofopen
andO_CLOEXEC
toopen
. This change attempts to use 'e' or equivalent mechanisms to avoid the race if at all possible. -
At first glance, this whole situation seems inapplicable to Windows. Its recommended APIs already set the equivalent of close-on-exec by default. However
open
(or_open
) on Windows is a low level C interface that does not set this by default. -
From the macOS man pages, you would be led to believe there is no way to avoid the race described in (1). However,
O_CLOEXEC
is supported despite being undocumented in theopen
man page.⁴ -
There seems to be no way to directly probe for 'e' (or 'N') support in
fopen
. Implementations do not reliably reject mode characters they do not support, sofopen
-ing with 'e' can silently fail. You canfcntl(fileno(…), F_GETFD)
to learn whether 'e' had an effect, but if it did not you have just run the race in (1) and potentially leaked a descriptor. We could craft some non-trivial build-time check for this, but the complexity of having this work across all three build systems as well as during cross-compilation does not seem worth it. This change chooses to just compile-time branch on the availability of various mechanisms based on preprocessor definitions.
Glibc’s open
man page has a succinct summary of the fundamental problem:
O_CLOEXEC
Enable the close-on-exec flag for the new file descriptor. Specifying this flag permits a program to avoid additionalfcntl
(2)F_SETFD
operations to set theFD_CLOEXEC
flag.Note that the use of this flag is essential in some multithreaded programs, because using a separate
fcntl
(2)F_SETFD
operation to set theFD_CLOEXEC
flag does not suffice to avoid race conditions where one thread opens a file descriptor and attempts to set its close-on-exec flag usingfcntl
(2) at the same time as another thread does afork
(2) plusexecve
(2). Depending on the order of execution, the race may lead to the file descriptor returned byopen
() being unintentionally leaked to the program executed by the child process created byfork
(2). (This kind of race is in principle possible for any system call that creates a file descriptor whose close-on-exec flag should be set, and various other Linux system calls provide an equivalent of theO_CLOEXEC
flag to deal with this problem.)
¹ One of the generalized family of fork
functions (fork
, vfork
, etc.)
followed by the child process running one of the generalized family of exec
functions (execve
, execvp
, etc.). This includes posix_spawn
that
effectively wraps this sequence.
² https://udrepper.livejournal.com/20407.html
³ https://danwalsh.livejournal.com/53603.html
⁴ There are reports O_CLOEXEC
did not work on old versions of macOS
(https://groups.google.com/g/golang-dev/c/7mmT7o9GYb4), but these versions are
long out of support.