• Jeff King's avatar
    execv_dashed_external: wait for child on signal death · 46df6906
    Jeff King authored
    When you hit ^C to interrupt a git command going to a pager,
    this usually leaves the pager running. But when a dashed
    external is in use, the pager ends up in a funny state and
    quits (but only after eating one more character from the
    terminal!). This fixes it.
    Explaining the reason will require a little background.
    When git runs a pager, it's important for the git process to
    hang around and wait for the pager to finish, even though it
    has no more data to feed it. This is because git spawns the
    pager as a child, and thus the git process is the session
    leader on the terminal. After it dies, the pager will finish
    its current read from the terminal (eating the one
    character), and then get EIO trying to read again.
    When you hit ^C, that sends SIGINT to git and to the pager,
    and it's a similar situation.  The pager ignores it, but the
    git process needs to hang around until the pager is done. We
    addressed that long ago in a3da8821 (pager: do
    wait_for_pager on signal death, 2009-01-22).
    But when you have a dashed external (or an alias pointing to
    a builtin, which will re-exec git for the builtin), there's
    an extra process in the mix. For instance, running:
      $ git -c alias.l=log l
    will end up with a process tree like:
      git (parent)
         git-log (child)
           less (pager)
    If you hit ^C, SIGINT goes to all of them. The pager ignores
    it, and the child git process will end up in wait_for_pager().
    But the parent git process will die, and the usual EIO
    trouble happens.
    So we really want the parent git process to wait_for_pager(),
    but of course it doesn't know anything about the pager at
    all, since it was started by the child.  However, we can
    have it wait on the git-log child, which in turn is waiting
    on the pager. And that's what this patch does.
    There are a few design decisions here worth explaining:
      1. The new feature is attached to run-command's
         clean_on_exit feature. Partly this is convenience,
         since that feature already has a signal handler that
         deals with child cleanup.
         But it's also a meaningful connection. The main reason
         that dashed externals use clean_on_exit is to bind the
         two processes together. If somebody kills the parent
         with a signal, we propagate that to the child (in this
         instance with SIGINT, we do propagate but it doesn't
         matter because the original signal went to the whole
         process group). Likewise, we do not want the parent
         to go away until the child has done so.
         In a traditional Unix world, we'd probably accomplish
         this binding by just having the parent execve() the
         child directly. But since that doesn't work on Windows,
         everything goes through run_command's more spawn-like
      2. We do _not_ automatically waitpid() on any
         clean_on_exit children. For dashed externals this makes
         sense; we know that the parent is doing nothing but
         waiting for the child to exit anyway. But with other
         children, it's possible that the child, after getting
         the signal, could be waiting on the parent to do
         something (like closing a descriptor). If we were to
         wait on such a child, we'd end up in a deadlock. So
         this errs on the side of caution, and lets callers
         enable the feature explicitly.
      3. When we send children the cleanup signal, we send all
         the signals first, before waiting on any children. This
         is to avoid the case where one child might be waiting
         on another one to exit, causing a deadlock. We inform
         all of them that it's time to die before reaping any.
         In practice, there is only ever one dashed external run
         from a given process, so this doesn't matter much now.
         But it future-proofs us if other callers start using
         the wait_after_clean mechanism.
    There's no automated test here, because it would end up racy
    and unportable. But it's easy to reproduce the situation by
    running the log command given above and hitting ^C.
    Signed-off-by: default avatarJeff King <peff@peff.net>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
run-command.c 28.9 KB