Skip to content

Document and refactor Workhorse upload routines

Problem

The upload logic in Workhorse is not easy to follow, much less so easy to contribute to.

We have adopted a certain domain language around uploads, such as "direct upload", "disk buffering" and "upload encodings":

https://docs.gitlab.com/ee/development/uploads.html

However, these terms are insufficiently reflected in code, and basically all code modules participating in upload logic are undocumented. In fact, a full text search reveals that the terms "direct upload" and "disk buffering" are nowhere to be found in Workhorse.

There appear to be 3 packages involved in uploads overall:

  • upload
  • filestore
  • objectstore

These are also not well documented and I'm not sure responsibilities are clear, since for instance filestore contains routines related to uploading files, which I would expect to be in upload instead.

Function names are also often unclear. For instance, there appear to be 2 primary functions we use to connect upload endpoints:

  • upload.BodyUploader
  • upload.Accelerate

What does Accelerate mean? Are body uploads not accelerated? Looking at its implementation, it appears to be specific to multipart uploads more than anything, i.e. as opposed to using the body encoding. This looks like a false dichotomy on the surface.

Proposal

The task here would be:

  • Review Workhorse upload packages and routines and refactor them so that:
  • Add extensive code comments to guide new contributions