Skip to content
  • Jeff King's avatar
    teach fast-export an --anonymize option · a8722750
    Jeff King authored and Junio C Hamano's avatar Junio C Hamano committed
    
    
    Sometimes users want to report a bug they experience on
    their repository, but they are not at liberty to share the
    contents of the repository. It would be useful if they could
    produce a repository that has a similar shape to its history
    and tree, but without leaking any information. This
    "anonymized" repository could then be shared with developers
    (assuming it still replicates the original problem).
    
    This patch implements an "--anonymize" option to
    fast-export, which generates a stream that can recreate such
    a repository. Producing a single stream makes it easy for
    the caller to verify that they are not leaking any useful
    information. You can get an overview of what will be shared
    by running a command like:
    
      git fast-export --anonymize --all |
      perl -pe 's/\d+/X/g' |
      sort -u |
      less
    
    which will show every unique line we generate, modulo any
    numbers (each anonymized token is assigned a number, like
    "User 0", and we replace it consistently in the output).
    
    In addition to anonymizing, this produces test cases that
    are relatively small (compared to the original repository)
    and fast to generate (compared to using filter-branch, or
    modifying the output of fast-export yourself). Here are
    numbers for git.git:
    
      $ time git fast-export --anonymize --all \
             --tag-of-filtered-object=drop >output
      real    0m2.883s
      user    0m2.828s
      sys     0m0.052s
    
      $ gzip output
      $ ls -lh output.gz | awk '{print $5}'
      2.9M
    
    Signed-off-by: default avatarJeff King <peff@peff.net>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    a8722750