Skip to content
  • Jeff King's avatar
    http-backend: spool ref negotiation requests to buffer · 6bc0cb51
    Jeff King authored and Junio C Hamano's avatar Junio C Hamano committed
    When http-backend spawns "upload-pack" to do ref
    negotiation, it streams the http request body to
    upload-pack, who then streams the http response back to the
    client as it reads. In theory, git can go full-duplex; the
    client can consume our response while it is still sending
    the request.  In practice, however, HTTP is a half-duplex
    protocol. Even if our client is ready to read and write
    simultaneously, we may have other HTTP infrastructure in the
    way, including the webserver that spawns our CGI, or any
    intermediate proxies.
    
    In at least one documented case[1], this leads to deadlock
    when trying a fetch over http. What happens is basically:
    
      1. Apache proxies the request to the CGI, http-backend.
    
      2. http-backend gzip-inflates the data and sends
         the result to upload-pack.
    
      3. upload-pack acts on the data and generates output over
         the pipe back to Apache. Apache isn't reading because
         it's busy writing (step 1).
    
    This works fine most of the time, because the upload-pack
    output ends up in a system pipe buffer, and Apache reads
    it as soon as it finishes writing. But if both the request
    and the response exceed the system pipe buffer size, then we
    deadlock (Apache blocks writing to http-backend,
    http-backend blocks writing to upload-pack, and upload-pack
    blocks writing to Apache).
    
    We need to break the deadlock by spooling either the input
    or the output. In this case, it's ideal to spool the input,
    because Apache does not start reading either stdout _or_
    stderr until we have consumed all of the input. So until we
    do so, we cannot even get an error message out to the
    client.
    
    The solution is fairly straight-forward: we read the request
    body into an in-memory buffer in http-backend, freeing up
    Apache, and then feed the data ourselves to upload-pack. But
    there are a few important things to note:
    
      1. We limit the in-memory buffer to prevent an obvious
         denial-of-service attack. This is a new hard limit on
         requests, but it's unlikely to come into play. The
         default value is 10MB, which covers even the ridiculous
         100,000-ref negotation in the included test (that
         actually caps out just over 5MB). But it's configurable
         on the off chance that you don't mind spending some
         extra memory to make even ridiculous requests work.
    
      2. We must take care only to buffer when we have to. For
         pushes, the incoming packfile may be of arbitrary
         size, and we should connect the input directly to
         receive-pack. There's no deadlock problem here, though,
         because we do not produce any output until the whole
         packfile has been read.
    
         For upload-pack's initial ref advertisement, we
         similarly do not need to buffer. Even though we may
         generate a lot of output, there is no request body at
         all (i.e., it is a GET, not a POST).
    
    [1] http://article.gmane.org/gmane.comp.version-control.git/269020
    
    
    
    Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net>
    Signed-off-by: default avatarJeff King <peff@peff.net>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    6bc0cb51