Support no-downtime upgrades via alternative documentRoot for static files
Problem
From omnibus-gitlab#3895, admins attempting to upgrade GitLab via our no-downtime upgrade procedure have found that CSS and JavaScript often don't load while the upgrade is in progress (omnibus-gitlab#3895 (comment 419060384)). This is because in a mixed deployment scenario with a load balancer, this can happen:
- User accesses node version N+1, which then makes a CSS/JS request on version N.
- User accesses node version N, which then makes a CSS/JS requests on version N+1.
In both scenarios, the user gets a 404 since only one version of the assets exist on a given server.
Solutions
- Use a CDN. On GitLab.com, we have a canary deployment, and all assets get prefaced with
https://assets.gitlab-static.net
. Fastly has some mechanism for retrieving the requested files from the server from which it was requested (or maybe it just caches it when it receives a 200?). - Install the current and target version assets into all nodes, and do the upgrade. This is what gitlab$2019978 does. It's a bit kludgy because it pollutes
/opt/gitlab/embedded/service/gitlab-rails/public/assets
.
It's not clear to me how easy is it to support an out-of-the-box CDN for GitLab. NGINX does make it possible to serve as a CDN (http://linuxplayer.org/2013/06/nginx-try-files-on-multiple-named-location-or-server). It should be possible to designate a deploy node (the first node to be upgraded), and have NGINX request assets from the deploy node if it encounters a 404. However, this requires some knowledge of the customer's topology and network DNS changes.
Perhaps a simpler option is to install all the requisite assets (say, versions 12.0 to 13.0) in a directory. Currently, the NGINX config for GitLab routes all /assets
requests to Workhorse, which serves files from the documentRoot
parameter. We might want to add another parameter (e.g. altDocumentRoot
) that Workhorse searches if it can't find the assets in the documentRoot
.
That way, we can:
- Create a separate package/tarball that installs all the assets in some alternative directory that does not conflict with the Omnibus-packaged files.
- Ensure that every node has a copy of every asset needed for no-downtime upgrades.
Thoughts?