Document and refactor Workhorse upload routines
Problem
The upload logic in Workhorse is not easy to follow, much less so easy to contribute to.
We have adopted a certain domain language around uploads, such as "direct upload", "disk buffering" and "upload encodings":
https://docs.gitlab.com/ee/development/uploads.html
However, these terms are insufficiently reflected in code, and basically all code modules participating in upload logic are undocumented. In fact, a full text search reveals that the terms "direct upload" and "disk buffering" are nowhere to be found in Workhorse.
There appear to be 3 packages involved in uploads overall:
upload
filestore
objectstore
These are also not well documented and I'm not sure responsibilities are clear, since for instance filestore
contains routines related to uploading files, which I would expect to be in upload
instead.
Function names are also often unclear. For instance, there appear to be 2 primary functions we use to connect upload endpoints:
upload.BodyUploader
upload.Accelerate
What does Accelerate
mean? Are body uploads
not accelerated? Looking at its implementation, it appears to be specific to multipart uploads more than anything, i.e. as opposed to using the body encoding. This looks like a false dichotomy on the surface.
Proposal
The task here would be:
- Review Workhorse upload packages and routines and refactor them so that:
- They reflect common domain language used (see Ubiquitous Language in Domain Driven Design).
- Use Intention Revealing Interfaces and rename types and routines accordingly.
- Add extensive code comments to guide new contributions
- At the package level
👉 !80292 (comment 869937483) - In routines, unless Intention Revealing Names make their purpose obvious.
- At the package level