Merge gitlab-workhorse codebase (and possibly others) into gitlab-ce. Or: monorepo! monorepo! monorepo!
Current status
After my holidays (so ~7th Jan), I'll create a merge request that will git subtree workhorse into the gitlab-ce repository (so keeping full history), and attempt to solve the pipeline problem. If I can, I'll see about creating MRs for CNG, GDK and omnibus that will use monorepo'd workhorse. It won't be a large investment of time, and it will allow us to see whether the change is going to be worth it or not. Obviously, merging those MRs will require sign-off from other interested parties, gitlab maintainers, etc. I don't intend to force this through, but I do think it's difficult to discuss it in the abstract.
Problem to solve
Kind-of a follow-up to gitlab-workhorse#191 (closed)
Workhorse is very tightly integrated with gitlab-rails, and seems set to have a long future. We often want to make changes between the two in lockstep. Doing so involves more ceremony than I enjoy, especially when a security release is involved, and even more so in the (currently theoretical) case of a security issue in a gitlab-managed dependency of gitlab-workhorse.
In that nightmare scenario, we need to:
- Update the dependency according to semver, create new tags for each affected version
- Update gitlab-workhorse to vendor the new versions, create new tags for each affected version
- Update gitlab-ce to make changes and bump
GITLAB_WORKHORSE_VERSION
in each affected version - Update gitlab-ee to port conflicting CE changes to EE
That could be 16 MRs! Currently, we're regularly handling security releases that need 8 MRs when they involve pages or workhorse. It's very painful, and we also have problems where a new non-security release of pages or workhorse in the meantime means we have to re-do much of that work.
Further details
This much process and overhead for what could potentially be a one-line change is expensive, and reduces our velocity. It also means we might be tempted to pretend some things aren't security issues when they actually are, to avoid the overhead.
Proposal
We could go full monorepo and put the gitlab-pages
and gitlab-workhorse
codebases into the gitlab-ce
repository. Perhaps also the gitlab-shell repository
. We could also put the gitlab-elasticsearch-indexer
codebase in there (it's MIT anyway). Gitaly, the new labkit, many things...
So we might end up with a repository like this, thinking pragmatically:
gitlab-ce/
app/ # ... normal rails stuff
bin/
lib/
vendor/
components/ # name is totally up for discussion
elasticsearch-indexer/
gitaly/
labkit/
pages/
shell/
workhorse/
The principal upside of this is that we need less ceremony to make a new release. A change that affects both workhorse and gitlab-ce can be made in a single MR, and we don't have to maintain a semver interface to workhorse anymore. We get it, and the monthly workhorse/pages releases we currently do manually, for free. All the existing gitlab-ce processes, tooling and magic will start to apply to these projects in a way that they don't at present.
It also becomes easier to contribute. Community contributions involving these satellite repositories do happen, but they're relatively rare, and often require a lot of coaching to get the two halves of the thing working.
A downside is that the CI pipelines suddenly become more complicated. We have tests to run for several components. It's possible that .gitlab-ci.yml
gives us the flexibility to avoid this - run only workhorse tests if all the changes are localised to components/workhorse
, for instance.
There's a mismatch between "project" as an engineering project, and "project" as a codebase. I don't think that every codebase we have needs to be a separate engineering project. In the past, we've merged other repositories into the gitlab-ce one; I think there's scope to do so here.
What does success look like, and how can we measure that?
Reduce the overhead of a security release in a gitlab-controlled component that gitlab-rails relies on from 4N MRs to 4 MRs
Links / references
I've added team labels appropriate to the projects I've mentioned above, hoping that people who are interested subscribe. Also CCing: @gitlab-org/maintainers/rails-backend @tommy.morgan @dhavens @edjdev @andrewn