"Obviously the smaller the resources, the faster they will arrive in the browser. Currently we're not compressing HTML responses. Even with low gzip compression levels, we could be speeding up page loads, particularly on slow connections."
Proposal
Investigate if it makes sense to compress HTML responses, and if yes: do it.
Links / references
Documentation blurb
(Write the start of the documentation of this feature here, include:
Why should someone use it; what's the underlying problem.
What is the solution.
How does someone use this
During implementation, this can then be copied and used as a starter for the documentation.)
@marin, LOL... but I was hoping that the linked issue #33501 (closed) would provide the context you were seeking. The idea is to improve performance of GitLab.com from the user's perspective by compressing the HTML that is sent back to user's browsers to a gzip. It was suggested to me that this fits with the build team, perhaps because it would require packaging a compression tool to run in some logical step in the process (workhorse?).
@andrewn can you provide further insight into your original point / idea, since I don't seem to be doing it justice?
My original point was that we should lightly compress dynamically generated text/json/html data on it's way out of nginx using the gzip on directive.
Additionally, AFAIK, assets are being served up from Workhorse.
Is there a reason that we don't serve these up directly from nginx?
Once/if we're serving assets directly from nginx, we should use gzip_static on static assets, and pre-compressing them with gzip -9 during the build process.
Is there a reason that we don't serve these up directly from nginx?
Yes. Nginx is not a hard requirement for GitLab stack. You can switch it out for any other web server. Omnibus GitLab ships Nginx, but you can disable the bundled web server and provide your own. So Workhorse is the last component in the stack we have control over when it comes to asset serving.
If nginx is part of the stack (as it is on Gitlab.com) we can optionally get a performance gain by lightly compressing (level 1) content in nginx on it's way back to the client.
Since we're moving to a CDN for GitLab.com, the amount of static asset traffic that we will be serving from gitlab.com should drop drastically. I recommend that we wait until the CDN is implemented before discussing this further.
thanks for that quick analysis. I'm still curious to know what the gain would be for static assets without a CDN. Depending on the level of effort involved, we may get around to doing this prior to having the CDN up, or the CDN may benefit from it in any case?
It would be nice, if feasible, to do this at the workhorse level so that it works independent of any load balancer / proxy setup. I think there are some gems to do this.
This would also afford us more control in case we run into any issues and need to exclude certain content from compression.
@joshlambert configuring this in nginx is a few lines of configuration, whereas re-implementing it in golang involves developing and testing a lot of mission-critical code. The nginx configuration is highly configurable and can easily be turned on or off without disrupting activity.
I would say let's use nginx as a first step at least.
@ernstvn right now, nginx is forwarding static asset requests to Workhorse, so this is hard to change without writing code. However, it would be easy to terminate the static assets requests in nginx instead of forwarding them on.
We do a similar thing in Gitter and it works very well and gives us the ability to configure precompressed gzip (and more recently brotli) assets.
Additionally, nginx is faster and more efficient than golang. Microbenchmarking them (disclaimer: for what that's worth) shows nginx can handle about 2x static asset requests as golang's static http module.
Without a CDN, we'll get a big advantage from terminating early when we can (ie, nginx is available). With a CDN, we'll still gain an advantage by lowering the latency on CDN cache misses.
@andrewn I get the feeling that your comments are neglecting the fact that we have to ship this to everyone. This is not only a few lines of change that can be applied on GitLab.com. That is, if we don't decide as a company to threat everyone with a different proxy differently. A lot of the Workhorse work was done to remove the dependence on a web-server specific features. Correct me if my feeling is wrong :)
@marin thanks for requesting clarification! I'm aware that nginx is not available everywhere but for those environments that it is available, I feel we should take advantage of it's features.
Not every site has the same scale requirements as GitLab.com, but for those that do, I feel we should leverage nginx rather than attempting to rebuild it, feature-by-feature, in Workhorse.
What I like about this setup is that, when nginx is running, it'll serve up static content directly, without forwarding the each request to Workhorse. When nginx is not available, Workhorse will continue to serve up the static content exactly as it does at present.
The downstream configuration remains the same, but nginx will simply offload the static requests when its there.
Does this make sense, or do you think I have an over-simplified view of the current configuration?
@andrewn I think we certainly all agree that this is something that will help our users, it will reduce bandwidth costs on both sides as well as increase performance and responsiveness. While smaller installations may not get as big of a bang for the buck, the performance/responsiveness benefits of faster asset downloads will still be a nice win.
In accordance with our goals of having best practices enabled for all and what we run on GitLab.com, we'd want to have this the default mode of operation for all of our users.
The challenges with relying on nginx for this, is that there are just a huge number of ways to deploy GitLab and some do not include it. The features and optimizations we put into a non-core service (like nginx) means that some customers may not receive the benefit, and further complicates our installation process. There will be more customers will have to do to optimize their reverse proxy, and we'll also have to sprinkle this across our documentation further complicating those.
So it's going to be a tradeoff between what I would think is best for customers and ease of spinning up a "best practice" GitLab installation, versus the effort spent to add it to workhorse and how much of a CPU cost it may be on this layer of the service.
My concern is that we're rebuilding nginx in Workhorse, and that seems like a classic case of NIH.
If the effort involved in both approaches were of a similar magnitude, I might agree with you, but it's definitely not.
To illustrate, here's the complete Gitter nginx compression configuration. It's only a dozen lines of configuration, but it covers a huge amount of complexity...
## Handle static asset compression differently from generated responses## All non-image assets are served from `.gz` and `.br` files precompressed on## the file-system, for faster response with least CPU overhead.location/assets/{....brotli_staticon;# Use statically precompressed brolti assets modern browsers (smaller than gzip)gzip_staticon;# Use statically precompressed gzip assets for browsers that accept compressiongzip_http_version1.0;# Only use compression for HTTP/1.0 clientsgzip_proxiedany;# Allow proxies to accept compressed responsesgzip_disable"MSIE[1-6]\.";# Ignore these browsers if they request compressed asssetsgzip_varyon;# Be aware of ...}## Application generated contentlocation/api/{...gzipon;# Compress responses on the flygzip_types*;# If theres a chance that we're serving up precompressed data, we would turn it off heregzip_comp_level2;# Configure the compression levelgzip_proxiedany;# Allow proxies to accept compressed responses...}
Looking at the configuration options listed above, here's a list of things that we would need to be aware of if replacing nginx with Workhorse compression:
NB: Make sure that we're honouring Etags and HTTP conditional requests. These are very important for static assets as telling a browser they already have the right data (and therefore skipping the request entirely is always going to be fastest)
NB: Make sure to honour the Vary header otherwise downstream caches will serve compressed assets to clients not accepting compression and visa-versa. Also, once these caches are corrupted, it's very difficult to fix (they are not under our control)
NB: Static assets should not be compressed on the fly as the content doesn't change, so recompressing it on every request is a waste of resources, and slow.
(optional, but nice) Brotli compression (which achieves much higher compression rates) is nice to include if the browser accepts br encoding.
(optional, but useful) Don't use HTTP compression on old HTTP clients
(optional, but useful) Don't use HTTP compression for certain buggy browsers
ngx_http_gzip_filter_module.c gives you an idea of some (but not all) of the compexity involved in doing this right.
In the interests of delivering a minimum viable change, let's use nginx when we have it.
This will make a huge difference to GitLab.com customers (and also other customers using nginx) for relatively little effort compared to rebuilding ourselves.
Enable it to work with NGINX - can be completed by next release (?) by 1 (?) developer.
Enable it to work from Workhorse - would require (how many? 2?) developers for (how long? 1 month?) to be shippable. We'd have to consider the opportunity cost as well, what are these developers otherwise working on?
If the estimate is 1 dev for 1 mth vs 2 devs for 1 mth, I'd say let's pick option 2, but @andrewn perhaps you want to revise my estimates upward?
GZIP can improve the performance a lot so i also rather have it faster at least for some very quickly with NGINX (which brings gzip and also some other nice things for web serving to the table) and replace it at a later stage. But you have the better insight.
We also should look intensively on delivering JSON through gzip.
It is not a big deal for static assets, as they should not contain security tokens.
The data as static assets, images, build artifacts can and probably should be compressed, everything else that is dynamic: API jsons, htmls should not be.
If the estimate is 1 dev for 1 mth vs 2 devs for 1 mth, I'd say let's pick option 2, but @andrewn perhaps you want to revise my estimates upward?
I'm sure 2 devs could deliver something that worked within a month, but let's consider the opportunity costs:
This change will benefit high-volume/high-traffic installations the most
High volume installations will almost certainly already be running nginx in front of their workhorse setup already
Even if we implemented it in workhorse, it still makes sense (for high volume sites) to terminate asset traffic in nginx because it's just faster (less hops, more efficient C++ implementation, better optimised).
So, with this in mind, and sticking with the philosophy of Minimum Viable Change, I still think that delivering this as an nginx config makes the most sense.
High volume installations will almost certainly already be running nginx in front of their workhorse setup already.
We have customers running Apache and other things between workhorse and their other load balancer so saying almost certainly is unprecise at best.
I disagree that we should do this in this way but I don't care about it that much anymore. Just final stress that the more differences in behaviour we have between different installations, the more issues we will encounter and more problems we will have supporting users. We stabilised the NGINX config (less changes in the configuration) for a reason. This reason being that any change that we made in web server config caused issues for our users over prolonged periods of time (catch up with upgrades).
@andrewn If you want to take this on and do the MVP with your team including testing on GitLab.com, communicating the change with the rest of the community, you are free to do so. At the moment of writing this, Build team has no capacity to do this.
I'm sure 2 devs could deliver something that worked within a month
Great! The question - as always - comes down to people and priorities :-)
I assume that for this work we would need to rely on Golang developers... @stanhu how would you imagine staffing this effort / where would it fit in the stack of priorities?
@timzallmann can you provide some more insight into how enabling this would cut down total time from "web request made" to "page interactive" ?
Just final stress that the more differences in behaviour we have between different installations, the more issues we will encounter and more problems we will have supporting users.
I fully agree with this, and it has indeed been the philosophy all along.
After conversations with @stanhu and @sytses, I am now changing opinion. We should indeed simply enable gzip in NGINX, as @andrewn was arguing. The reasoning is already laid out by others in this issue... principally the ease of doing this vs. trying to rebuild gzip in workhorse + avoiding needing to maintain something built in-house.
The counterargument that I was weighing heavily -- that we wish to ship the benefits of these decisions to all users / customers -- still applies; but this only applies to customers and users who use an official installation method. Since NGINX is part of our GitLab.com setup, it is what we're using as the standard, see https://docs.gitlab.com/omnibus/settings/nginx.html and note the absence of the apache version of that page.
Furthermore, enabling gzip in NGINX is the most boring solution, and the smallest iteration.
So with this in mind, who should pick up this task @stanhu ?
I understand the concerns of the build team, but I think for something like gzip that is well-tested and implemented in every major Web server, it makes sense to add the configuration in NGINX. There is a lot of cost in maintaining our own gzip support in Workhorse, and a lot can go wrong if it doesn't work properly. We should document the optimizations for Apache and other Web servers, but things should still work fine if they don't support it.
As @andrewn mentioned in https://gitlab.com/gitlab-org/gitlab-ce/issues/33719#note_33338312, this seems like a fairly straightforward task of tweaking the NGINX config to compress assets. I did a quick test on GitLab.com. However, I noticed the /assets folder is already being compressed with gzip:
@stanhu so our static assets are already always gzipped? :-( There goes some perceived low hanging fruit. And applying gzip to html and json is currently not advised because of the vulnerability that @ayufan highlighted... @briann appreciate your take on the vulnerability.
@timzallmann I guess this means that working on smart loading (webpack, code-splitting, async fetching, etc.) and smart asset sizing (pictures) is the next best thing?
@stanhu@ernstvn One way to mitigate BREACH is to selectively compress based on the content of the referrer header. If the referrer matches the instance URL, enable compression, otherwise don't.
I need to better understand the BREACH attack, but from my understanding it does not apply in our case as it's not feasible for the attacker for inject arbitrary content into our pages.
If you look at almost any major website, they're using br or gz content-encoding over https.
@andrewn The theory is that a malicious website that can trick a user into visiting it and monitor traffic from that user to GitLab.com can use custom crafted links back to GitLab.com and watch the size of packets to slowly determine the content of the requests, including sensitive data.
That's why checking the referrer and only compressing requests with a referrer that matches the GitLab instance is an effective counter-measure.
It sounds impossible but a malicious wifi gateway or proxy could do it.
@briann so I've been investigating this further and it appears that Rails has had built-in support for BREACH mitigation since August 2014
They do this by XOR'ing the CSRF token with a random length byte sequence which prevent the attack because the attacked can't know how long the request should be.
Enable it to work from Workhorse - would require (how many? 2?) developers for (how long? 1 month?) to be shippable.
I was looking at something at the gitlab-workhorse repository for something unrelated and I found this: gitlab-workhorse!177 (diffs) . Does this mean that workhorse already supports this?
@marin assuming Jacob says "yes", then of course we switch it on there (right? why not?). But assuming Jacob says "no", or "not really", or anything less than a definitive "yes", we should just continue with the path of switching it on in NGINX. How much effort is involved in that? I believe it is fully on your team?
someone on your in-house team may want to review and understand https://tom.vg/papers/heist_blackhat2016.pdf for more context on the compression attack surface with user-generated data / known plaintext data. All of the gitlab static assets represent known plaintext.
The only truly effective mitigations I am aware of involve cipher suites that implement forward secrecy in conjunction with the server injected random size / random data into each request (could be via HTTP header, HTML Meta, etc)
Right, thanks for that article. The CSRF issue as mentioned above is addressed by the Rails CSRF mask, but I think we could do more by adding the nginx length_hiding module:
This module appends randomly generated HTML comment to the end of HTML response to hide correct length and make it difficult for attackers to guess secret information.
I think @stanhu is on the right track here. If you read the article @techguru posted it shows that compression is not needed to exploit these types of vulnerabilities. The only solution offered is to disable third-party cookies which we've discussed in the past and dismissed. I'd like to do some more research on the libraries Stan has linked.
@stanhu Are there reasons why we think checking the referer header wouldn't provide enough benefit? Do we expect most of the requests that would benefit from compression to not be including a proper referer, automated tasks and such?
While compression based on referer headers wouldn't provide as much benefit as compression + padding it would be a lot easier to implement quickly and wouldn't make us more vulnerable to the non-compression attacks.
Are there reasons why we think checking the referer header wouldn't provide enough benefit? Do we expect most of the requests that would benefit from compression to not be including a proper referer, automated tasks and such?
Wouldn't a compression-based attack just forge the referer header then?
You can "spoof" it for your own browser. As in, you can run curl with a custom referer header. But a malicious website cannot ask you to request a resource from another server using a spoofed header. Not without a browser vulnerability that allows it.
@stanhu As Paul Charlton mentions in the MR, that won't work for us. It might work for Omnibus installs on a single host. Our fleet fqdn are set to their hostnames.
It would be nice if we could simply check to see if the referer hostname matched the Host header. That way nginx wouldn't even need to know the primary URL. Unfortunately we don't yet verify the Host header.
@stanhu Currently I don't see any compression on dynamic text responses (neither HTML or JSON). So will we never enable this due to the security discussion or what is our current status? As i constantly get reminded by all performance tools that we could potentially save 80% of our bandwith on those types of documents and I only saw the not closed discussion in your MR.
Currently I don't see any compression on dynamic text responses (neither HTML or JSON). So will we never enable this due to the security discussion or what is our current status?
I think we are stalled on the security implications for turning on compression for all dynamic content. It's easy to enable it in NGINX; the question is whether we've done enough to mitigate possible attacks.
Disabling third-party cookies is the most effective solution that I know of. Second is disabling compression, which helps mitigate BREACH but not HEIST, and is not practical IMHO. Although, could it become a configurable option for on-prem?
Considering we don't use third-party cookies on GitLab.com, and we've deployed the built-in rails BREACH defence mechanisms is the security group willing for us to go ahead with compressing HTML responses on GitLab.com?
This will make a big difference to performance on GitLab.com.
Thanks from my side too, this would make me as FE engineer very happy. This would be definitely a huge performance boost. All my measurements and tools tell me at the moment that compression and image resizing would be the biggest performance improvements that are available and would outweigh any programmatic frontend improvement that we could do by a big factor. Especially also compressing JSON responses would definitely help there.
So the earlier we can get this the better it would be.
My understanding here is that we are looking to do this just on .com or will we be shipping this in the omnibus package?
Do we need to consider the impact on self-hosted customers as well @timzallmann? If so, can you make sure your security assessment mentions if it's applicable for self-hosted or not, and what we expect there.
I believe the BREACH attack would also affect self-hosted.
For now since there's no industry standard or best practice for resolving this, let's accept this risk and move forward with adding compression. We can revisit it once further developments on BREACH and HEIST have been made.
In the meantime, I think we should approach a simpler solution for mitigating both BREACH and HEIST by using same-site cookies. Though only Chrome and Firefox currently support it , it should be a small, effective step in the right direction considering the web browser market share.