Workhorse: gitlab-zip-metadata requires too many HTTP Range Requests with Go v1.17
The upgrade to Go v1.17 caused a performance incident today with processing CI artifacts: gitlab-com/gl-infra/production#5521 (closed)
Go v1.17 introduced a call to f.readDataDescriptor
in https://go-review.googlesource.com/c/go/+/312310/14/src/archive/zip/reader.go#120 that causes extra HTTP Range Requests to be called for each file in the archive. More discussion about the interface change is in https://github.com/golang/go/issues/34974.
An upstream issue has been filed in https://github.com/golang/go/issues/48374.
It looks like this was added to support ZIP64 (useful sources: https://github.com/golang/go/issues/34974#issuecomment-829971988, https://www.artpol-software.com/ZipArchive/KB/0610051629.aspx, https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT). We may want to report this issue upstream because it's not clear to me the performance implications were considered here.
Right now, I'm not sure how we work around this issue easily in Go v1.17. I'm not sure if the OpenRaw
interface works here since gitlab-zip-metadata
can extract a file either from the network or from a local file. In the network case, we only have an io.Reader
, so we'd need to save the file to disk first for this to work.
Another option may be to look at a third-party library (e.g. https://github.com/mholt/archiver) that might be able to support HTTP Range Requests more effectively.