Improve gzip compression handling for maintainability
In https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/14852, we enabled gzip
compression by modifying gsutil
to include the -z
option in rsync
so that all files with those extensions will be compressed before uploading and tagged with the Content-Encoding: gzip
metadata so Google Cloud Storage can serve these assets in gzip and non-gzip form transparently (gitlab-com/www-gitlab-com!96011 (merged)).
The problem as https://github.com/GoogleCloudPlatform/gsutil/pull/1430#issuecomment-1010234090 describes is that -z
wasn't include in the gsutil rsync
because rsync
relies on comparing checksums before and after the upload. The stored file in GCS uses the checksum of the uncompressed version, but the source version will always be compressed. Therefore, all files with the matching extensions will be replaced.
Since we probably want to avoid using the hacked gsutil
, we have a few options:
-
Use
find
andgsutil
together. Something like:find . -name \*.html -o -name \*.txt | xargs -I {} echo {} gsutil cp -Z {} gs://bucket-dest
-
Revisit application compression: !107 (closed), gitlab-com/www-gitlab-com!96012 (closed)
I personally think the first option is easier, but I'll defer to @tywilliams and @laurenbarker for how they would like to handle this going forward.