Approval Request: Monorepo approach for www-gitlab-com
TL;DR
After careful consideration of the short term and long term objectives of different stakeholders of the https://about.gitlab.com website (primarily marketing and static site editor group for now) we have come up with a proposal to restructure the www-gitlab-com repository into a monorepo structure.
This approach offers the best compromise between short term needs and long term objectives. It's a two-way door decision, giving us the option to revert to our current structure if need be, and also the flexibility to move into separate repositories down the line if needed.
Overview
Historically the pipeline for the website has been slow (30+min) causing not only frustration, but also business risk in case something needed to be deployed urgently to the website. With over a 1000 employees the frequency of changes to the handbook has increased dramatically over time adding pressure on the ability to deliver quick updates on other areas of the site.
Improvements have been made over the last 3 months (see dashboard in Periscope) to bring the average time down to around 13min, however apart from the need for further reduction in this time there are other concerns as well raised by marketing.
Here are the main goals from Marketing, based on Todd Barr's comment, which also apply to the handbook and other areas of the https://about.gitlab.com website:
- Modern publish speed: less than 5 minutes from publish to production. When we need to do a critical update (blog post, usually), it needs to reliably publish in less than 5 minutes. Faster is even better. It can't be on a merge train following a bunch of handbook posts.
- User-friendly, CMS-like interface for blog editors/publishers.
- Marketing non-reliance on engineering for blog and marketing website. In 2019, it is not standard to need to rely on an engineering organization to publish standard websites.
Technical infrastructure approach options matrix
Approach | Pipeline time | Ease of editing | Engineering dependency | Time cost to implement | Pros | Cons |
---|---|---|---|---|---|---|
Single repo, single project | Slow pipeline | Same editing experience across all sections | Risk of unrelated areas causing issues | None - current status is this | All common tools and processes are shared. Maintenance efforts and infrastructure improvements are shared on the project. |
Less flexibility for projects to be able to diverge in terms of technical approach. Scaling issues with regards to the number of people making concurrent updates. Unrelated areas can cause issues in pipelines |
Monorepo, multiple projects (separate pipelines) | Fast pipeline, shared merged train between all projects in the repo | Freedom to implement different editing experience | Some isolation between projects to prevent unrelated areas causing issues | Separate pipelines could be achieved without restructuring the code. Achieving a complete monorepo project isolation will require a larger effort to restructure the code. This is largely already done for the blog. |
Same benefits as single repo above. Project isolation and separate pipelines for each project. All common tools and processes are shared but can be customized and opt-in/opt-out per projects. Easier separation of responsibility. Can make changes across the blog, website and handbook in 1 merge request. |
More risk/effort for projects to be able to make drastic changes in tools and processes without impacting other areas. Repo size is still going to be large and will continue to grow much faster than a single repo. Unrelated areas could still cause issues in pipelines. |
Separate repo | Fast pipeline, separate merge train | Freedom to implement different editing experience | Isolation between projects to prevent unrelated areas causing issues | Blog is close to being able to be in a separate repo. Moving the handbook into its own repo would have a high time cost (2 weeks) |
No risk of breaking unrelated projects, even for major restructuring or refactoring. Conceptually easier to work on. Can be optimized specific to its need. |
Much more work and processes needed to benefit from shared processes and tools. Duplication of work. This does not necessarily solve all of the repo size issues (existing large files will still exist and continue to be added in the future). |
Technical Recommendation: Monorepo
How this approach meets the objectives
Based on the analysis in the matrix above, the monorepo approach offers the best chance to move towards goals 1 and 3:
- In a shorter time frame
- With lower risk - both short term and longer term
- More iteratively, rather than wholesale changes.
- We will still be able to benefit from sharing of common infrastructure, tools and processes
- It does not lock us in to this approach long term, it is a two-way door. It would be trivial to move a project from a monorepo into its own repo.
- Conversely, the separate repo approach results in significantly more long-term risk due to lock-in to the cons mentioned for a separate repo. It is not a two-way door.
Whether we go with a monorepo or separate repo it doesn’t impact the feasibility of goal 2. Achieving a true CMS like editing experience on the blog/handbook is a large undertaking and not something that will be able to be solved in a week or two. The Static Site Editor group has a aim of having a viable solution in place for this by FY21 Q2, however there are features landing in the interim that should help improve the editing experience for non-technical team members.
Development of a Side-by-side preview while editing markdown in the Web IDE is scheduled to be started in Milestone 12.8 which could provide immediate benefits to the editing experience:
- No need to run the project locally as editing can be achieved straight in GitLab using the Web IDE
- Previewing changes to the markdown file right next to where you are editing it.
As for goal #2 (closed), there are potential stopgap solutions like integrating Netlify CMS into our existing processes. Lauren has already spiked on this.
Example repo structure
A monorepo is simply a repository where the code for many projects are stored in the same place, instead of having them in separate repositories.
Currently the www-gitlab-com repo is a single Middleman project. In the monorepo approach each site (or in our case namespaces) is will be its own Middleman project.
/
├── projects/
| ├── about/
| | ├── ... //marketing website files deployed to about.gitlab.com
| ├── blog/
| | ├── ... //blog website files deployed to about.gitlab.com/blog/
| ├── handbook
| | ├── ... //handbook website files deployed to about.gitlab.com/handbook/
| ├── jobs
| | ├── ... //jobs website files deployed to about.gitlab.com/jobs/
├── config/
| ├── about-pipeline.yml
| ├── blog-pipeline.yml
| ├── handbook-pipeline.yml
| ├── jobs-pipeline.yml
├── README.md
├── .gitlab-ci.yml
├── ...
Note: this is just an example structure for demonstration purposes
Drawbacks
Addressing the possible drawbacks of the monorepo approach:
- Shared merged train between all projects in the repo
- In the event of a merge request needing to be deployed as a priority it can still bypass the train and be merged directly to master.
- While this has a negative impact to MRs queued up in the train since this would only happen under exceptional circumstances it is a feasible option to deal with the impact as needed.
- Large repo size
- The primary cause for the large repo size is due to media and other assets.
- There are various options available to us to handle this so that it doesn’t have a negative impact on pipeline speeds or the repo size
- Unrelated areas could still cause issues in pipelines.
- While this risk is drastically reduced compared to the single repo and project approach we currently have, there is still a small chance that someone could make changes to the configuration that impacts an unrelated area.
- To mitigate this we will introduce a code review process and require specific approval for changes to files in the repo that could lead to possible issues if care wasn’t taken.
Timeline
A key dependency on delivering on the pros of the monorepo approach is the Parent/Child pipelines feature scheduled for completion in 12.7.
Assuming the feature is delivered on time the timeline for delivering on the first iteration of the monorepo could look like this:
- 13 Jan - 22 Jan: Prepare project for monorepo structure
- 22 Jan: Parent/Child pipeline feature becomes available
- 23 - 31 Jan: Finalise project pipeline configuration and moving of blog into separate project in the monorepo.
- 3 Feb: New monorepo finalised and blog operating on its own pipeline
Further improvements
Here are links to the immediate, specific improvements which we can start on right now to further reduce the pipeline build time metrics:
Larger scope
- Manually do Incremental Builds
- Implement middleman external pipeline using Webpack or Gulp for assets
- Discuss coming up with a process for bypassing the merge train for scenarios in which we need to update the site ASAP (e.g. recent instances involving telemetry, or who we do business with).
- More powerful runners for some jobs
- Restructure Pipelines to Independently Build and Publish Different Parts of the Site
Smaller scope
- Further Parallelize Build Steps
- Disable unnecessary jobs in pipeline when merging to master
- Disable all remaining unnecessary jobs in pipeline when merging to master
- Cache node_modules
- Short Circuit bundle install with bundle check
- Replace "dependencies" with "needs"
Approvals
We should aim to have a decision made on this by Wed, 15 Jan 2020 to be able to meet the indicated timeline.
-
@ebrinkman (Director of Product, Dev) -
@timzallmann (Director of Engineering, Dev) -
@ericschurter (Senior PM, Static Site Editor) -
@sfwgitlab (VP, Product) -
@sbouchard1 (Director of Brand and Digital Design) -
@tbarr (CMO) -
@sytses (CEO)