Document and refactor Workhorse upload routines
Problem
The upload logic in Workhorse is not easy to follow, much less so easy to contribute to.
We have adopted a certain domain language around uploads, such as "direct upload", "disk buffering" and "upload encodings":
https://docs.gitlab.com/ee/development/uploads.html
However, these terms are insufficiently reflected in code, and basically all code modules participating in upload logic are undocumented. In fact, a full text search reveals that the terms "direct upload" and "disk buffering" are nowhere to be found in Workhorse.
There appear to be 3 packages involved in uploads overall:
uploadfilestoreobjectstore
These are also not well documented and I'm not sure responsibilities are clear, since for instance filestore contains routines related to uploading files, which I would expect to be in upload instead.
Function names are also often unclear. For instance, there appear to be 2 primary functions we use to connect upload endpoints:
upload.BodyUploaderupload.Accelerate
What does Accelerate mean? Are body uploads not accelerated? Looking at its implementation, it appears to be specific to multipart uploads more than anything, i.e. as opposed to using the body encoding. This looks like a false dichotomy on the surface.
Proposal
The task here would be:
- Review Workhorse upload packages and routines and refactor them so that:
- They reflect common domain language used (see Ubiquitous Language in Domain Driven Design).
- Use Intention Revealing Interfaces and rename types and routines accordingly.
- Add extensive code comments to guide new contributions
- At the package level
👉 !80292 (comment 869937483) - In routines, unless Intention Revealing Names make their purpose obvious.
- At the package level