Skip to content

Add Git LFS smudge filter

Stan Hu requested to merge sh-gitaly-lfs-smudge-filter into master

This commit adds a new binary, gitaly-lfs-smudge, that will be used to include Git LFS blobs inside a project archive file.

The smudge filter will eventually be called by specifying the -c filter.lfs.smudge option when the include_lfs_blob flag is enabled.

The filter works as follows:

  1. Read the LFS pointer from STDIN.
  2. Decode the OID, if one is available.
  3. Make an internal API call to Workhorse/Rails for this OID.
  4. Write the contents of the file to STDOUT.

If any errors are encountered, log an error and return a non-zero exit code.

Part of gitlab#15079 (closed)

There is much more work to do to enable this feature:

  1. Add include_lfs_blobs flag in GetArchiveRequest RPC: !2607 (merged)
  2. Configure git archive to use filter.lfs.smudge with this binary: !2621 (merged)
  3. Make Workhorse and Rails send include_lfs_blobs: gitlab!44116 (merged)
  4. Support GIT_LFS_SKIP_SMUDGE and --skip flag in smudge filter (?)
  5. Add logging to smudge filter
  6. Make Workhorse distinguish a cached archive with and without LFS pointers

Data flow:

sequenceDiagram
     Client->>+Workhorse: GET /group/project/-/archive/master.zip
     Workhorse->>+Rails: GET /group/project/-/archive/master.zip
 	Rails->>+Workhorse: Gitlab-Workhorse-Send-Data git-archive
     Workhorse->>Gitaly: SendArchiveRequest
     Gitaly->>Git: git archive master
     Git->>Smudge: OID 12345
     Smudge->>+Workhorse: GET /internal/api/v4/lfs?oid=12345&gl_repository=project-1234
     Workhorse->>+Rails: GET /internal/api/v4/lfs?oid=12345&gl_repository=project-1234
     Rails->>+Workhorse: Gitlab-Workhorse-Send-Data send-url
     Workhorse->>Smudge: <LFS data>
     Smudge->>Git: <LFS data>
     Git->>Gitaly: <streamed data>
     Gitaly->>Workhorse: <streamed data>
     Workhorse->>Client: master.zip
Edited by Stan Hu

Merge request reports