Skip to content

WIP: Middleman symlink-less file sharing

Chad Woolley requested to merge cwoolley-middleman-symlinkless-file-sharing into master

NOTE: This MR is complete and closed without merge. All incomplete tasks related to the handbook/engineering and handbook/marketing middleman build have been moved to the related monorepo issue, and all other incomplete or unrelated tasks have been moved to the top-level Handbook Improvement epic

What does this MR do?

This was a spike/exploration MR to iterate on what approaches and prerequisite tasks were needed. The actual work was done in #6689 (closed) and various other supporting MRs linked below.

There are issues with the symlink-based approach which was taken when doing this for the blog in !39183 (merged), because it has downsides:

  • it's confusing for people to see two copies of the same shared file in their editors
  • if you pick a symlink instead of the original file, some things don't work (e.g. git history in RubyMine)

So, if possible, we should avoid doing this four more times (which would result in people seeing six copies of all the shared files in their editors).

Instead, this MR will explore using the approach described here: https://github.com/middleman/middleman/issues/1902#issuecomment-286554690 (NOTE: I think the "duplicate partials path" problem mentioned there is not a real problem. I think that's just a side effect of the way Middleman looks for every file in every possible directory if it's not found - as long as I got destination_dir: "." correct as shown below, it worked for me)

Proof of concept for of including common content from outside the Middleman root dir with files.watch

If you make this change locally (note it deletes the layouts dir symlink), then run NO_CONTRACTS=true bundle exec middleman from the sites/blog directory, it will work (the layouts are found).

In theory, this allows us to have the monorepo grouped structure, where sites/handbook contains a custom middleman config and only group-specific content (as the blog already does) but without needing symlinks.

Index: sites/blog/config.rb
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- sites/blog/config.rb	(revision 71fc78eccdd1a7f545c730299532aa874ca1f2cf)
+++ sites/blog/config.rb	(date 1589263865116)
@@ -96,3 +96,11 @@
 ignore '/category.html'
 ignore '/.gitattributes'
 ignore '**/.gitkeep'
+
+# Monorepo setup to access common files
+
+files.watch(
+  :source,
+  path: File.expand_path('/Users/cwoolley/workspace/www-gitlab-com/source/layouts'),
+  destination_dir: "."
+)
\ No newline at end of file
diff --git sites/blog/source/layouts sites/blog/source/layouts
deleted file mode 100644
index e473c0618d580a4faf77c3ea3c38f70e061b5d96..0000000000000000000000000000000000000000
GIT binary patch
literal 0
Hc$@<O00001

Proof of concept of conditionally including composable Middleman Config files to implement sub-builds

(warning: brain dump ahead, needs organization and clarification)

Calling a sub-config from the top-level config

UPDATE: We probably won't end up using this approach of "composable config files", at least for now - the main reason being that we can't yet get rid of "PartialBuild" and the monolithic build approach it entails, so we will split things up in different ways. But, it's good to know that this is possible if we need it in the future*

The top-level config.rb can include other config files like this:

config_all_rb = File.join(Middleman::Application.root, 'config_all.rb')
instance_eval(File.read(config_all_rb), config_all_rb, 1)

This seems to work fine in dev. However, it is unknown whether more complex files can be successfully composed in this way.

What does this mean for us?

So, this could be used to conditionally include or exclude individual sub-builds based on some criteria. This opens up many possible options which we have been wanting.

For example to start with:

  • In dev and master, we could by default include all sub-builds.
  • But for MRs, we could conditionally only run the sub-builds based on which source files changed, which could be done by default on MRs (for speed) and via opt-in in development. The non-built files wouldn't be present, but we could try to work around that by "priming" the deployable artifacts with the latest artifacts from master.
  • There would also be some sub-builds which would always run, because they are always required. For example, certain common CSS or javascripts, and the top-level index page.
  • This approach should greatly speed up MR pipelines

...and it opens up for future optimizations:

  • Once we have this in place, we would finally be in a position (now that we have converted to rules) to be able to speed up the master build a lot by only building what has changed.
  • We would still have to deal with the "rsync deletion" problem, where master deploys the entire site to the prod bucket with the gcp rsync --delete option, so it will remove any deleted files, but also delete any files that weren't built in the current pipeline.
  • BUT, we could remove the deletion flag from normal master pipelines which only build what changed, then have a periodic scheduled job (every hour or day) which does a full build and deploys with the rsync --delete option.
  • This approach should greatly speed up master pipelines

Proof of concept spike tasks

Rough outline of tasks to finish proof of concept validation of the approaches described above. May not need to do all of these, or do them completely to be confident of the approach.

Part 1

Dedicated handbook partial build job. Note - this does not have the responsibility of building the asset bundles - that will remain at the top level for now. In the future, it can be delegated to individual sites to build their own bundles.

  • [-] Move example handbook sources (handbook/engineering and handbook/marketing) to /sites/handbook/source
    • [-] Ensure that that all files under git lfs ls-files are re-added to .gitattributes with git lfs track ... BEFORE committing the move. (UPDATE: Maybe the lfs files aren't even used, and can be deleted. See !50398 (comment 345482002))
    • NOTE: In order to be more incremental, temporary symlinks to engineering and marketing will be used. This will allow debugging, testing, and deployment of the sub-site build without dealing with ongoing conflicts due to constant changes to the source files. For reference, the original (detached) commit which actually moved the files is available here.
    • Symlink handbook/engineering and handbook/marketing to /sites/handbook/source
  • Make a /sites/handbook/source/config.rb which:
    • [-] Serves pages successfully via middleman dev server This was abandoned as a goal, because it requires more files than necessary to be pulled in. Instead, the build is designed behave in build mode/environment identically to the current corresponding "Job 5" build which is run by PartialBuild.
    • Successfully builds via middleman build
    • Detailed tasks (not a complete list):
      • Exclude what isn't needed
      • Includes dependencies via files/watch
      • ...
  • Fix any other broken relative dependencies which still live at root
    • Switch to require_relative '../../path/to/dependency/from/root'
    • Fix any other broken relative references (see !50401 (merged))
  • Dedicated build_handbook job to build only them and add to artifacts
  • Modify handbook section in PartialBuild to be a no-op
  • Ensure handbook content is published in review app.
  • Do a directory-level diff of the public dir generated by the existing build and new build, and they should be completely identical:
    • existing: git clean -dfx public && CI_NODE_INDEX=5 CI_NODE_TOTAL=9 bundle exec middleman build --bail
    • new (from sites/handbook): git clean -dfx ../../public && time RAILS_ENV=test NO_CONTRACTS=true bundle exec middleman build --bail
  • See if any of the config can be eliminated by using a manipulate_resource_list extension like PartialBuild. ANSWER: Yes, they can, so we don't need to use config[:ignored_sitemap_matchers] as long as we don't put non-top-level-pages under the dirs included by manipulate_resource_list

Part 2

This step and the following steps are currently tentative, we may not do them this way, not do them yet, or do them at all

Handbook-only builds for MRs

  • [-] Make build_handbook job be the only one run if the only changes are to the files in the handbook sub-site (via rules) Don't try to do this in the first pass - it's complicated and too much chance for error. Just keep building everything always, but with the sub-build.
  • "Prime" the rest of the full content to the review server with the artifacts from master. Doing this in a separate MR: !50598 (merged)
    • NOTE: This may not currently be possible/reliable due to this bug
    • Do this in a separate job that is not blocked by the build or test stages, to speed up the initial population, and make the deploy job dependent upon the prime job.
    • Come up with a better name for the "prime" job. deploy_latest_master_to_review? Yes, review-prep.
    • See https://docs.gitlab.com/ee/ci/pipelines/job_artifacts.html#downloading-the-latest-artifacts
    • Will need to add an artifacts:paths entry to master deploy job to upload artifacts. can't do this, it's over the 10M artifact upload size limit. Have to download the individual build jobs' artifacts instead
  • Update the review-prep job to match the actual build jobs being invoked.
  • Remove HANDBOOK_ENGINEERING_MARKETING case from PartialBuild (but leave the handbook_engineering_marketing? because other methods rely on it)
  • Remove all commented code in sites/handbook/config.rb

NOTE: For MR branches which are very out-of-date, review-prep job may cause inconsistencies or failures, when trying to run the older site code against the newer "primed" artifacts from master. However, the current "blog post" subset of the build doesn't even deploy non-blog files at all, so the experience shouldn't be any worse than that.

Part 3

Incorporate handbook partial build into master

  • Make a rake build_all job so we can still test building everything from local dev
  • Make master build work just like the MR build: building handbook engineering/marketing is the responsibility of the build-handbook job, and PartialBuild for this partial is a no-op.

Part 4

TESTING

Dev environment should still work:

  • Ensure that the top-level build still works for all content, with hot-reloading, asset rebuilds, etc.
    • Test it with files actually moved, not just symlinked. May need to add files.watch to top-level build in order for it to work.

Make sure master works:

  • Fork it, and test behavior on master branch of the fork.

Part 5

  • Notify people to close out any pending MRs that touch files to be moved, or be prepared to resolve conflicts.

=========================================================================

END OF SPIKE as of 2020-06-02 - See the main issue for remaining/updated tasks and work.


=========================================================================

  • [-] Remove the symlink and actually move the files, then merge quickly.
  • [-] Merge to master.

Part 6

Convert blog to this approach

Note that the current partial build for the blog doesn't save much time on the overall build stage, at most a minute or so. The review step IS much slower, but this is because it is uploading everything. If we can use one or more of the approaches above of "priming" the review app bucket and/or skipping builds that aren't needed, these times should be equivalent.


Other related/prerequisite tasks

  • Make the master/MR deploy jobs download bin/combine-sitemaps instead of requiring them to be passed by every individual build job's artifacts. (separate MR: !51084 (merged))
    • This decouples the deploy job from the artifacts, and enables other optimizations such as not passing along unnecessary artifacts.
  • Clean up build_base job anchor logic. (separate MR: !51092 (merged)) Benefits:
    • It's hard to reason about necessary changes and improvements to the build jobs with all of the logic currently going on in nested bash conditionals/etc.
    • Try to clean it up, with the goal of having no multi-line commands or conditionals within the script tag.
  • Additional refactoring of deploy jobs. (Separate MR: !51359 (merged))
  • Clean up consistency of job names: switch to all dashes rather than mix of dashes and underscores (done: !51132 (merged))
  • Fix calls to curl and unzip to not print verbose progress and fill up logs. (separate MR: !51343 (merged))
  • Remove bundle from global before_script and add it to individual tasks - looks like it's used fewer places than it's not. (Separate MR: !51214 (merged))
  • Remove haml ugly:true and add minify_html for blog-only (blog post) middleman config (Separate MR: !51360 (merged))
  • [-] Do not disable HAML options validation (Separate MR: !51362 (merged). But actually it IS needed, reverted here: !51376 (merged))
  • Dedicated job to push cache, all others just pull. This will save 20-30 seconds on each job. (separate MR: !51371 (merged), and !51373 (merged))
  • Change class name in partial_build_blog.rb from PartialBuild to PartialBuildBlog (separate MR: !51374 (merged))
  • DRY up review-prep job. Ideally there's a single source of truth for jobs run vs. artifacts to download.
  • Delete Latex/Pandoc from dockerfile and related rake tasks if its no longer needed
    • See chat with Lukas Eipert
    • Just put a redirect to the HTML version of the two PDFs
    • Verify with original creators to see if this is OK.
  • Only run external pipeline to build assets via rollup in "ASSETS" job. Benefits:
    • Faster jobs: will shave 30+ seconds off every job.
    • Less log output/noise
    • Consolidates handling of assets to one place, which better positions us to switch to Webpack in the future.
  • Remove rsync deletion from main master and MR pipelines; only do it periodically on master. This has many benefits:
    • Avoids incidents like this where an artifact download bug caused an outage because much of the production site was deleted.
    • Facilitates partial deploys of portions of the site and/or monorepo sub-sites. This will help meet other goals, such as improved performance and decoupling different portions of the repo/builds.
  • Do not pass along job artifacts from build jobs if they are identical to the most recent build.
    • This will improve performance of the master/MR deploy jobs; a significant portion of their time is downloading all of the generated artifacts from the build jobs, 99.9%+ of which are identical to the already and do not need to be deployed.
    • This can be implemented by downloading the artifacts from the latest successful master build (as the MR pipeline job deploy_latest_master_to_review already does), and comparing them to the ones generated by the current build job. If they are identical, then nothing needs to be uploaded (i.e., the public dir can be emptied out).
  • Use Google Cloud SDK directly from standard docker image rather than downloading separate image managed by Google. (separate MR for the Dockerfile: gitlab-org/gitlab-build-images!295 (merged), and changes to the build: !51324 (merged)) Benefits:
    • Speed up jobs, because we can use the standard docker image which is more likely to already be cached on a given runner (since more jobs use it)
    • Speed up jobs, because even though the standard image is bigger (about 1g), it downloads in about about 30 seconds vs 1:30 for the GCP image, because it's being pulled from the gitlab Container Registry rather than Docker Hub (apparently no intermediate caching of external images is happening other than on individual runner instances)
    • Allows more options for cleaning up / DRYing up deploy scripts, because we can use Ruby instead of just python/bash (which is all the GCP docker image gives us).
    • Reduces the chance of future incidents like this one where they broke the performance of the docker image, because the CRC check stopped being optimized, then caused an outage because they removed the python version without a workaround. As part of that incident, Google admitted that the docker image had remaining issues and needed cleanup.

Out-of-scope future tasks

Notes on tasks which are out of scope for this spike:

  • Asset builds should be per-site. Instead of the top level build being responsible for building all asset bundles, it can be delegated to individual sites to build their own bundles.
  • caching-optimized proxy resource build jobs should be set up per-site.
  • Deciding what to do about redirects. Do these have to be managed monolithically, or can they be delegated to the responsible sites?
  • Repeat the pattern/process taken for the handbook engineering/marketing build in this and related MRs for all other sub-sites, including switching the blog from symlinks to this approach.
  • Get the monkey patch for files.watch into Middleman and/or Frontman.

Radical Idea:

  • Get rid of the separate "deploy" job, and make each build job responsible for publishing its own artifacts. This may never be possible or desirable, because of the need to coordinate the deploy of dependent artifacts across separate jobs (e.g. js/css/image assets used by pages). But for sub-site builds which are a single build job, it would speed things up a lot, by eliminating the overhead of spinning up a separate job just to do the rsync deploy.

Other debugging notes

  • Add logging to middleman-core-4.3.3/lib/middleman-core/sources.rb Sources#find
  • Add logging to middleman-core-4.3.3/lib/middleman-core/sources/source_watcher.rb SourceWatcher#find

Tech Debt Cleanup Notes

Unrelated things I saw which should be cleaned up, making a note of them here for future MRs...

  • [-] Fix calls to Google cloud SDK rsync to not print verbose progress and fill up logs. UPDATE: Looks like this may not be possible without suppressing all output including filenames:
  • Move all non-top-level pages which are referenced as partials out from under source/handbook/engineering and ``source/handbook/marketing, so by default they are not built and output to public` and explicit `ignore` entries are not needed to prevent them being built and output.
    • Write a linter task to enforce this. E.g. hook into middleman and identify anything used as a partial, and fail the build if these exist under the list of resources allowed by manipulate_resource_list
  • Rename the package.json "bundle" task and sub-scripts to something consistent and descriptive like build_assets, instead of combination of other overloaded terms like bundle, pipeline.
  • Fix this comment in config.rb: "# Populate the direction and releases pages only on master in the build_proxy_resource_* jobs or development machines."
  • Remove usages of autoload, e.g. in lib/homepage.rb, because autoload is going away and requires hacking around it by adding the monorepo root to the $LOAD_PATH in the sites' config.rbs
  • Combine rm public/sitemap.xml into extract_sitemap_urls
  • Eventually get rid of PartialBuild.
    • It relies on assumptions that will no longer hold:
      1. Entire site will be built by PartialBuild
      2. There will be a consistent number of partial or sub-site builds.
    • But, it is still needed because it defines some partial builds in terms of others, e.g. all_others?
  • includes/hiring-chart.html.erb should use a stable ID rather than a random number <% unique_id = "Chart#{rand(10000..99999)}" %>, to make it easier to do full-directory diffs of public when looking for regressions in sub-site monorepo builds.

Related issues

Relates #6689 (closed)

Edited by Chad Woolley

Merge request reports