I can use your help @briann@stanhu@pcarranza@andrewn@jtevnan in picking which one is right for us. The cost implications are significant for the more expensive plans.
@andrewn points out: "It seems, from reading the Cloudflare docs and from phoning their pre-sales support team, that Cloudflare doesn't support a classic CDN setup. Rather, they have to take over your DNS configuration for gitlab.com, which I feel is too risky for what we're trying to achieve here (which is a simple CDN setup)"
It would also make sense to take into the calculation putting some stuff of about.gitlab.com on the CDN in one of the next steps. Especially as Cloudflare has 2 image/mobile optimisation tools that could speed stuff up there as we have seen multiple times way too big images in the blog + website.
OK, started the sign up process for a free account, using ops-contact+cloudflare@gitlab.com.
Cloudflare did not like canary.gitlab.com, but when I gave it https://gitlab.com, it did a DNS search and listed ~20 entries. At this point I'd like to transfer the task to @jtevnan since I think he'll have a better handle on this than I will.
So, having looked at the options, I think we should consider an alternative provider to Cloudflare.
Why? It seems, from reading the Cloudflare docs and from phoning their pre-sales support team, that Cloudflare doesn't support a classic CDN setup. Rather, they have to take over your DNS configuration for gitlab.com, which I feel is too risky for what we're trying to achieve here (which is a simple CDN setup)
PS: setting up Cloudfront distributions is free AWS Cloudfront (it's the traffic that you pay for, not the distribution), so, in the interests of proving that it can be done, I created one under the Gitter AWS account.
Keep in mind that this test was done from London, which has good links and is relatively close to the US East Coast. While the "with CDN" times for China, Japan or Australia would remain roughly the same, the "without CDN" times are likely to be much, much higher.
Should we then run all GL assets or also all content assets (uploads, avatars, etc.) through the CDN ?
This will mean a significant different traffic , i just tested it with my avatar (with CDN 390 ms vs. 600 ms on 3 tries) that would give us an additional boost. But i don't think this would work out of the box, but i can take a look that we will get that in 9.4. as i just did a lot of work on image serving for the lazy image loading.
@mikegreiling Do we need to do on FE side something for the CDN support ?
However, it might be worth simply enabling @stanhu's change as a first step on Canary. This way the initial set of assets can be loaded from the CDN but Webpack will continue to load assets from the main site until @mikegreiling's change is implemented.
There may be technical reasons why we can't do this (possibly security headers), but if we can, doing this will give us a performance advantage now, with a clear path forward to further improvements once Webpack starts async loading bundles from the CDN too
Thx @andrewn for the info + links. I knew that @stanhu has done already the Ruby part, I will take a look what that option actually does (was missing the issue). Cause I think it might deliver Avatars but not the Content Images.
@mikegreiling : We should perhaps take a look that Webpack gets the info on the fly from that base Rails environment variable to make things easier and that this goes into 9.4 if possible
Is it that much work? We do a ton of stuff that is not an OKR.
And the best news is that this is responsive to an OKR, namely around improving the latency of GitLab.com. Sure it isn't called out explicitly for Production, but it's a company goal for sure and that's the kind of flexibility we allow for ourselves in the OKR process.
But more to the point @pcarranza if we do not have enough people to get this + OKR goals done in the next weeks / months, so be it, let's just make it very apparent what things are being done that are not in the OKRs and make sure those are all prioritized appropriately.
@ernstvn we are trying to keep the unscheduled and low priority to a minimum because we do get heavily randomized.
This requires someone sitting, reading, and evaluating options, then planning, then acting. It could be easy as in a couple of days (best case scenario), or it could be hard as in a couple of WoW. Because this is not an OKR, it gets constantly de-prioritized in favor of unblocks others, which means that it just keeps getting delayed.
By making it an OKR (because it's related to performance, and this is a company goal) we will make it explicit that this has to happen soon, and we will be able to resource it as such.
@ernstvn this brings an interesting point, it feels like we have a misalignment between production OKRs and infrastructure OKRs and then company wide OKRs
@pcarranza I'll propose an update to the OKRs to better align them. The end result of that update will be that performance related items carry more weight. Not sure if the level of detail (enable CDN) is needed but we will see. We can discuss that separately there; so that we keep this issue focused on the CDN.
So - with the assumption that this should be prioritized as an OKR item - can you please review where it then stacks compared to everything else and see who can work on it when?
I might be worth, alongside this evaluation, using canary.gitlab.com to test against the existing CDN I setup in AWS Cloudfront (which uses https://canary.gitlab.com as it's origin).
This would involve setting the following environment variable on the canary.gitlab.com instance....
Switching between CDN providers is trivial, just setup a new account and change GITLAB_CDN_HOST
There may well be further code changes that we need to implement that only testing will highlight.
For example, we may need to add the CDN host to our Content-Security-Policy header (unless Rails automatically adds action_controller.asset_host to the header?)
The best way of finding out what these (unknown-unknown) issues are is to test against Canary as early as possible. We can do this now.
Once we've done the testing, we can revert the change
If we discover any issues, we'll be able to address them in parallel with code fixes
Four providers were brought forth, CloudFlare was discarded for reasons stated above. The remaining three shook out thusly:
Amazon CloudFront
Pros
Strong Developed and Documented API
Large Global POP Distribution
Pay for What you Use Pricing
Cons
API Missing some key features around pre-warming, health-check management, eviction policies
Many features of a tightly integrated CDN are only realized when using other aspects of AWS services (Robust health reporting on target nodes, load balancer distribution of target fetches, etc)
Not as fast as the others, despite having a global footprint (same effect is noticed with Route53 DNS)
~10 Minute propagation time for version changes
Does not shield the origin node used for population
Logging is only available post fact in files written to an S3 bucket at rotation
No advanced edge ACL's or Controls
No POPS in remote regions like Africa, Middle-East, or Russa
Fastly
Pros
Robust API w/ full feature parity of every GUI configuration option
Fine tune'ed VCL control of caching server inner-workings (if desired)
Better SNI SSL support
Instant version change propagation
Can shield origin node and Origin pull headers
Can stream logs to whatever target you desire for complete log cohesiveness
Full edge ACL'ing
Anycast IP Provided for CDN serve point
Cons
Monthly fee of $50/mo on top of usage
Less POPS that CloudFront in major market cities
Slightly higher bandwidth costs than CloudFront (0.035 cents more/GB)
CDN77
Pros
No monthly usage fee
Slightly cheaper than both CloudFlare and Fastly (0.7 cents less/GB and 0.3 cents less/GB respectively)
Based upon fullness of feature set, ease of configurability, automation, and performance Fastly will be scheduled for a PoC testing on canary.gitlab.com followed by a trial run on gitlab.com.
@northrup Is there an issue with setting GITLAB_CDN_HOST on canary? @mikegreiling Does Webpack builds all the URLs needed for the site? Does it need to consider the CDN host?
Although it may not be affordable to proxy *.gitlab.io, for the rest of asset (sub)domains, I think you should consider trying out Business plan, considering $200/mo flat rate (no bandwidth costs). Depends on your budget possibilities.