Skip to content
  • Lars Schneider's avatar
    convert: add filter.<driver>.process option · edcc8581
    Lars Schneider authored and Junio C Hamano's avatar Junio C Hamano committed
    Git's clean/smudge mechanism invokes an external filter process for
    every single blob that is affected by a filter. If Git filters a lot of
    blobs then the startup time of the external filter processes can become
    a significant part of the overall Git execution time.
    
    In a preliminary performance test this developer used a clean/smudge
    filter written in golang to filter 12,000 files. This process took 364s
    with the existing filter mechanism and 5s with the new mechanism. See
    details here: https://github.com/github/git-lfs/pull/1382
    
    
    
    This patch adds the `filter.<driver>.process` string option which, if
    used, keeps the external filter process running and processes all blobs
    with the packet format (pkt-line) based protocol over standard input and
    standard output. The full protocol is explained in detail in
    `Documentation/gitattributes.txt`.
    
    A few key decisions:
    
    * The long running filter process is referred to as filter protocol
      version 2 because the existing single shot filter invocation is
      considered version 1.
    * Git sends a welcome message and expects a response right after the
      external filter process has started. This ensures that Git will not
      hang if a version 1 filter is incorrectly used with the
      filter.<driver>.process option for version 2 filters. In addition,
      Git can detect this kind of error and warn the user.
    * The status of a filter operation (e.g. "success" or "error) is set
      before the actual response and (if necessary!) re-set after the
      response. The advantage of this two step status response is that if
      the filter detects an error early, then the filter can communicate
      this and Git does not even need to create structures to read the
      response.
    * All status responses are pkt-line lists terminated with a flush
      packet. This allows us to send other status fields with the same
      protocol in the future.
    
    Helped-by: default avatarMartin-Louis Bright <mlbright@gmail.com>
    Reviewed-by: default avatarJakub Narebski <jnareb@gmail.com>
    Signed-off-by: default avatarLars Schneider <larsxschneider@gmail.com>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    edcc8581