Skip to content

casupload: fix `--output-digest-file` for directories, reduce ambiguity

Santiago Gil requested to merge santigl/dir-output-digest into master

Before raising this MR, consider whether the following are required, and complete if so:

  • [ ] Unit tests
  • [ ] Metrics
  • [ ] Documentation update(s)

If not required, please explain in brief why not.

Description

Current behavior

casupload receives a list of paths to files that are placed under a new directory before uploading it. For example, casupload foo.txt subdir/bar.txt will upload a directory consisting of:

 / <-- (Return Digest of this Directory)
 |--  foo.txt
 |--  subdir/
      | -- bar.txt

If a directory is specified in the list of paths, it will be uploaded individually to CAS. That is, casupload foo.txt subdir/bar.txt /tmp/foo will produce the same directory as above and will also upload /tmp/foo as its own directory.

Issue with --output-digest-file

This option makes casupload write the digest of the uploaded directory to a file. Currently it is the digest of the implicit directory created with all the files in positional arguments.

But if casupload is called for a directory (e.g: casupload /tmp/foo --output-digest-file=/tmp/foo.digest), no digest file is being written.

Changes proposed in this merge request:

  • Write digest file when called for a single directory with --output-digest-file=FILE
  • Assert that if --output-digest-file is set, the <paths> list is either all files or a single directory. This could potentially be a breaking change, but will reduce inconsistencies as to what digest is written in the file.
  • .gitlab-ci.yml: Add test_upload_and_download job that uploads a directory with casupload, retrieves it back with casdownload, and checks that the contents match (uses --output-digest-file to write the digest of the uploaded directory).

Alternatively, we could clearly document that the digest written is always the one for the implicit directory, or make casupload write multiple digests separated by new lines to the specified file.

Validation

Use casupload with different combinations of inputs and setting --output-digest-file=digest.file:

  • casupload file1 file2: digest.file contains the digest of a directory containing file1 file2
  • casupload dir: digest.file contains the digest of the directory
  • casupload file1 dir: this errors
  • casupload dir1 dir2: this also errors
Edited by Santiago Gil

Merge request reports

Loading