WIP: Middleman symlink-less file sharing
NOTE: This MR is complete and closed without merge. All incomplete tasks related to the handbook/engineering
and handbook/marketing
middleman build have been moved to the related monorepo issue, and all other incomplete or unrelated tasks have been moved to the top-level Handbook Improvement epic
What does this MR do?
This was a spike/exploration MR to iterate on what approaches and prerequisite tasks were needed. The actual work was done in #6689 (closed) and various other supporting MRs linked below.
There are issues with the symlink-based approach which was taken when doing this for the blog in !39183 (merged), because it has downsides:
- it's confusing for people to see two copies of the same shared file in their editors
- if you pick a symlink instead of the original file, some things don't work (e.g. git history in RubyMine)
So, if possible, we should avoid doing this four more times (which would result in people seeing six copies of all the shared files in their editors).
Instead, this MR will explore using the approach described here: https://github.com/middleman/middleman/issues/1902#issuecomment-286554690 (NOTE: I think the "duplicate partials path" problem mentioned there is not a real problem. I think that's just a side effect of the way Middleman looks for every file in every possible directory if it's not found - as long as I got destination_dir: "."
correct as shown below, it worked for me)
files.watch
Proof of concept for of including common content from outside the Middleman root dir with If you make this change locally (note it deletes the layouts
dir symlink), then run NO_CONTRACTS=true bundle exec middleman
from the sites/blog
directory, it will work (the layouts are found).
In theory, this allows us to have the monorepo grouped structure, where sites/handbook
contains a custom middleman config and only group-specific content (as the blog already does) but without needing symlinks.
Index: sites/blog/config.rb
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- sites/blog/config.rb (revision 71fc78eccdd1a7f545c730299532aa874ca1f2cf)
+++ sites/blog/config.rb (date 1589263865116)
@@ -96,3 +96,11 @@
ignore '/category.html'
ignore '/.gitattributes'
ignore '**/.gitkeep'
+
+# Monorepo setup to access common files
+
+files.watch(
+ :source,
+ path: File.expand_path('/Users/cwoolley/workspace/www-gitlab-com/source/layouts'),
+ destination_dir: "."
+)
\ No newline at end of file
diff --git sites/blog/source/layouts sites/blog/source/layouts
deleted file mode 100644
index e473c0618d580a4faf77c3ea3c38f70e061b5d96..0000000000000000000000000000000000000000
GIT binary patch
literal 0
Hc$@<O00001
Proof of concept of conditionally including composable Middleman Config files to implement sub-builds
(warning: brain dump ahead, needs organization and clarification)
Calling a sub-config from the top-level config
UPDATE: We probably won't end up using this approach of "composable config files", at least for now - the main reason being that we can't yet get rid of "PartialBuild" and the monolithic build approach it entails, so we will split things up in different ways. But, it's good to know that this is possible if we need it in the future*
The top-level config.rb
can include other config files like this:
config_all_rb = File.join(Middleman::Application.root, 'config_all.rb')
instance_eval(File.read(config_all_rb), config_all_rb, 1)
This seems to work fine in dev. However, it is unknown whether more complex files can be successfully composed in this way.
What does this mean for us?
So, this could be used to conditionally include or exclude individual sub-builds based on some criteria. This opens up many possible options which we have been wanting.
For example to start with:
- In dev and master, we could by default include all sub-builds.
- But for MRs, we could conditionally only run the sub-builds based on which source files changed, which could be done by default on MRs (for speed) and via opt-in in development. The non-built files wouldn't be present, but we could try to work around that by "priming" the deployable artifacts with the latest artifacts from master.
- There would also be some sub-builds which would always run, because they are always required. For example, certain common CSS or javascripts, and the top-level index page.
- This approach should greatly speed up MR pipelines
⏩
...and it opens up for future optimizations:
- Once we have this in place, we would finally be in a position (now that we have converted to rules) to be able to speed up the master build a lot by only building what has changed.
- We would still have to deal with the "rsync deletion" problem, where master deploys the entire site to the prod bucket with the gcp rsync
--delete
option, so it will remove any deleted files, but also delete any files that weren't built in the current pipeline. -
BUT, we could remove the deletion flag from normal master pipelines which only build what changed, then have a periodic scheduled job (every hour or day) which does a full build and deploys with the rsync
--delete
option. - This approach should greatly speed up
master
pipelines⏩
Proof of concept spike tasks
Rough outline of tasks to finish proof of concept validation of the approaches described above. May not need to do all of these, or do them completely to be confident of the approach.
Part 1
Dedicated handbook partial build job. Note - this does not have the responsibility of building the asset bundles - that will remain at the top level for now. In the future, it can be delegated to individual sites to build their own bundles.
- [-]
Move example handbook sources (handbook/engineering
andhandbook/marketing
) to/sites/handbook/source
- [-]
Ensure that that all files under(UPDATE: Maybe the lfs files aren't even used, and can be deleted. See !50398 (comment 345482002))git lfs ls-files
are re-added to.gitattributes
withgit lfs track ...
BEFORE committing the move. - NOTE: In order to be more incremental, temporary symlinks to
engineering
andmarketing
will be used. This will allow debugging, testing, and deployment of the sub-site build without dealing with ongoing conflicts due to constant changes to the source files. For reference, the original (detached) commit which actually moved the files is available here. -
Symlink handbook/engineering
andhandbook/marketing
to/sites/handbook/source
- [-]
-
Make a /sites/handbook/source/config.rb
which:- [-]
Serves pages successfully viaThis was abandoned as a goal, because it requires more files than necessary to be pulled in. Instead, the build is designed behave inmiddleman
dev serverbuild
mode/environment identically to the current corresponding "Job 5" build which is run byPartialBuild
. -
Successfully builds via middleman build
- Detailed tasks (not a complete list):
- Exclude what isn't needed
- Includes dependencies via
files/watch
- ...
- [-]
-
Fix any other broken relative dependencies which still live at root -
Switch to require_relative '../../path/to/dependency/from/root'
-
Fix any other broken relative references (see !50401 (merged))
-
-
Dedicated build_handbook
job to build only them and add to artifacts -
Modify handbook section in PartialBuild
to be a no-op -
Ensure handbook content is published in review app. -
Do a directory-level diff of the public
dir generated by the existing build and new build, and they should be completely identical:- existing:
git clean -dfx public && CI_NODE_INDEX=5 CI_NODE_TOTAL=9 bundle exec middleman build --bail
- new (from
sites/handbook
):git clean -dfx ../../public && time RAILS_ENV=test NO_CONTRACTS=true bundle exec middleman build --bail
- existing:
-
See if any of the config can be eliminated by using a manipulate_resource_list
extension likePartialBuild
. ANSWER: Yes, they can, so we don't need to useconfig[:ignored_sitemap_matchers]
as long as we don't put non-top-level-pages under the dirs included bymanipulate_resource_list
Part 2
This step and the following steps are currently tentative, we may not do them this way, not do them yet, or do them at all
Handbook-only builds for MRs
- [-]
MakeDon't try to do this in the first pass - it's complicated and too much chance for error. Just keep building everything always, but with the sub-build.build_handbook
job be the only one run if the only changes are to the files in the handbook sub-site (via rules) -
"Prime" the rest of the full content to the review server with the artifacts from master. Doing this in a separate MR: !50598 (merged) - NOTE: This may not currently be possible/reliable due to this bug
- Do this in a separate job that is not blocked by the
build
ortest
stages, to speed up the initial population, and make thedeploy
job dependent upon theprime
job. -
Come up with a better name for the "prime" job. deploy_latest_master_to_review
? Yes,review-prep
. - See https://docs.gitlab.com/ee/ci/pipelines/job_artifacts.html#downloading-the-latest-artifacts
-
Will need to add ancan't do this, it's over the 10M artifact upload size limit. Have to download the individual build jobs' artifacts insteadartifacts:paths
entry to masterdeploy
job to upload artifacts.
-
Update the review-prep
job to match the actual build jobs being invoked. -
Remove HANDBOOK_ENGINEERING_MARKETING
case fromPartialBuild
(but leave thehandbook_engineering_marketing?
because other methods rely on it) -
Remove all commented code in sites/handbook/config.rb
NOTE: For MR branches which are very out-of-date, review-prep
job may cause inconsistencies or failures, when trying to run the older site code against the newer "primed" artifacts from master. However, the current "blog post" subset of the build doesn't even deploy non-blog files at all, so the experience shouldn't be any worse than that.
Part 3
Incorporate handbook partial build into master
-
Make a rake build_all
job so we can still test building everything from local dev -
Make master build work just like the MR build: building handbook engineering/marketing is the responsibility of the build-handbook
job, and PartialBuild for this partial is a no-op.
Part 4
TESTING
Dev environment should still work:
-
Ensure that the top-level build still works for all content, with hot-reloading, asset rebuilds, etc. -
Test it with files actually moved, not just symlinked. May need to add files.watch
to top-level build in order for it to work.
-
Make sure master works:
-
Fork it, and test behavior on master branch of the fork.
Part 5
-
Notify people to close out any pending MRs that touch files to be moved, or be prepared to resolve conflicts.
=========================================================================
END OF SPIKE as of 2020-06-02 - See the main issue for remaining/updated tasks and work.
=========================================================================
- [-] Remove the symlink and actually move the files, then merge quickly.
- [-] Merge to master.
Part 6
Convert blog to this approach
Note that the current partial build for the blog doesn't save much time on the overall build stage, at most a minute or so. The review step IS much slower, but this is because it is uploading everything. If we can use one or more of the approaches above of "priming" the review app bucket and/or skipping builds that aren't needed, these times should be equivalent.
- full build: https://gitlab.com/gitlab-com/www-gitlab-com/pipelines/147325082/builds
- blog-only build: https://gitlab.com/gitlab-com/www-gitlab-com/pipelines/147321342/builds
Other related/prerequisite tasks
-
Make the master/MR deploy jobs download bin/combine-sitemaps
instead of requiring them to be passed by every individual build job's artifacts. (separate MR: !51084 (merged))- This decouples the deploy job from the artifacts, and enables other optimizations such as not passing along unnecessary artifacts.
-
Clean up build_base
job anchor logic. (separate MR: !51092 (merged)) Benefits:- It's hard to reason about necessary changes and improvements to the build jobs with all of the logic currently going on in nested bash conditionals/etc.
- Try to clean it up, with the goal of having no multi-line commands or conditionals within the
script
tag.
-
Additional refactoring of deploy jobs. (Separate MR: !51359 (merged)) -
Clean up consistency of job names: switch to all dashes rather than mix of dashes and underscores (done: !51132 (merged)) -
Fix calls to curl
andunzip
to not print verbose progress and fill up logs. (separate MR: !51343 (merged)) -
Remove bundle
from globalbefore_script
and add it to individual tasks - looks like it's used fewer places than it's not. (Separate MR: !51214 (merged)) -
Remove haml ugly:true and add minify_html for blog-only (blog post) middleman config (Separate MR: !51360 (merged)) - [-]
Do not disable HAML options validation(Separate MR: !51362 (merged). But actually it IS needed, reverted here: !51376 (merged)) -
Dedicated job to push cache, all others just pull. This will save 20-30 seconds on each job. (separate MR: !51371 (merged), and !51373 (merged)) -
Change class name in partial_build_blog.rb
fromPartialBuild
toPartialBuildBlog
(separate MR: !51374 (merged)) -
DRY up review-prep
job. Ideally there's a single source of truth for jobs run vs. artifacts to download. -
Delete Latex/Pandoc from dockerfile and related rake tasks if its no longer needed - See chat with Lukas Eipert
- Just put a redirect to the HTML version of the two PDFs
- Verify with original creators to see if this is OK.
-
Only run external pipeline to build assets via rollup in "ASSETS" job. Benefits: - Faster jobs: will shave 30+ seconds off every job.
- Less log output/noise
- Consolidates handling of assets to one place, which better positions us to switch to Webpack in the future.
-
Remove rsync deletion from main master and MR pipelines; only do it periodically on master. This has many benefits: - Avoids incidents like this where an artifact download bug caused an outage because much of the production site was deleted.
- Facilitates partial deploys of portions of the site and/or monorepo sub-sites. This will help meet other goals, such as improved performance and decoupling different portions of the repo/builds.
-
Do not pass along job artifacts from build jobs if they are identical to the most recent build. - This will improve performance of the master/MR deploy jobs; a significant portion of their time is downloading all of the generated artifacts from the build jobs, 99.9%+ of which are identical to the already and do not need to be deployed.
- This can be implemented by downloading the artifacts from the latest successful master build (as the MR pipeline job
deploy_latest_master_to_review
already does), and comparing them to the ones generated by the current build job. If they are identical, then nothing needs to be uploaded (i.e., the public dir can be emptied out).
-
Use Google Cloud SDK directly from standard docker image rather than downloading separate image managed by Google. (separate MR for the Dockerfile: gitlab-org/gitlab-build-images!295 (merged), and changes to the build: !51324 (merged)) Benefits: - Speed up jobs, because we can use the standard docker image which is more likely to already be cached on a given runner (since more jobs use it)
- Speed up jobs, because even though the standard image is bigger (about 1g), it downloads in about about 30 seconds vs 1:30 for the GCP image, because it's being pulled from the gitlab Container Registry rather than Docker Hub (apparently no intermediate caching of external images is happening other than on individual runner instances)
- Allows more options for cleaning up / DRYing up deploy scripts, because we can use Ruby instead of just python/bash (which is all the GCP docker image gives us).
- Reduces the chance of future incidents like this one where they broke the performance of the docker image, because the CRC check stopped being optimized, then caused an outage because they removed the python version without a workaround. As part of that incident, Google admitted that the docker image had remaining issues and needed cleanup.
Out-of-scope future tasks
Notes on tasks which are out of scope for this spike:
- Asset builds should be per-site. Instead of the top level build being responsible for building all asset bundles, it can be delegated to individual sites to build their own bundles.
- caching-optimized proxy resource build jobs should be set up per-site.
- Deciding what to do about redirects. Do these have to be managed monolithically, or can they be delegated to the responsible sites?
- Repeat the pattern/process taken for the handbook engineering/marketing build in this and related MRs for all other sub-sites, including switching the blog from symlinks to this approach.
- Get the monkey patch for
files.watch
into Middleman and/or Frontman.
Radical Idea:
- Get rid of the separate "deploy" job, and make each build job responsible for publishing its own artifacts. This may never be possible or desirable, because of the need to coordinate the deploy of dependent artifacts across separate jobs (e.g. js/css/image assets used by pages). But for sub-site builds which are a single build job, it would speed things up a lot, by eliminating the overhead of spinning up a separate job just to do the rsync deploy.
Other debugging notes
- Add logging to
middleman-core-4.3.3/lib/middleman-core/sources.rb
Sources#find
- Add logging to
middleman-core-4.3.3/lib/middleman-core/sources/source_watcher.rb
SourceWatcher#find
Tech Debt Cleanup Notes
Unrelated things I saw which should be cleaned up, making a note of them here for future MRs...
- [-] Fix calls to Google cloud SDK
rsync
to not print verbose progress and fill up logs. UPDATE: Looks like this may not be possible without suppressing all output including filenames: -
Move all non-top-level pages which are referenced as partials out from under source/handbook/engineering
and ``source/handbook/marketing, so by default they are not built and output to
public` and explicit `ignore` entries are not needed to prevent them being built and output.- Write a linter task to enforce this. E.g. hook into middleman and identify anything used as a
partial
, and fail the build if these exist under the list of resources allowed bymanipulate_resource_list
- Write a linter task to enforce this. E.g. hook into middleman and identify anything used as a
-
Rename the package.json
"bundle
" task and sub-scripts to something consistent and descriptive likebuild_assets
, instead of combination of other overloaded terms likebundle
,pipeline
. -
Fix this comment in config.rb
: "# Populate the direction and releases pages onlyon masterin thebuild_proxy_resource_*
jobs or development machines." -
Remove usages of autoload
, e.g. inlib/homepage.rb
, becauseautoload
is going away and requires hacking around it by adding the monorepo root to the$LOAD_PATH
in the sites'config.rb
s -
Combine rm public/sitemap.xml
intoextract_sitemap_urls
-
Eventually get rid of PartialBuild
.- It relies on assumptions that will no longer hold:
- Entire site will be built by
PartialBuild
- There will be a consistent number of partial or sub-site builds.
- Entire site will be built by
- But, it is still needed because it defines some partial builds in terms of others, e.g.
all_others?
- It relies on assumptions that will no longer hold:
-
includes/hiring-chart.html.erb
should use a stable ID rather than a random number<% unique_id = "Chart#{rand(10000..99999)}" %>
, to make it easier to do full-directory diffs ofpublic
when looking for regressions in sub-site monorepo builds.
Related issues
Relates #6689 (closed)