Update Cloudflare alerts
During production#2148 we noticed a large number of 503 errors being reported by Cloudflare that were not reflected in any of our other monitoring within the environment, aside from alerts for Firing 1 - The `waf` service, `zone_gitlab_com` component, `main` stage, has an error burn-rate exceeding SLO
. Following a discussion with Cloudflare support, we need to adjust our monitoring such that:
To summarize for our use-case:
- If we see EdgeResponseStatus 503 and OriginResponseStatus 0 this is a block in context of the 'I Am under attack mode'
- If we see EdgeResponseStatus 503 and OriginResponseStatus 503 this is a 503 we served
- if we see EdgeResponseStatus 52{2,3} and OriginResponseStatus 0, Cloudflare was unable to reach our origin
I would disregard the actual 503s with
cloudflare-nginx
as a scarcity for our usecase, which we would probably also notice in another way (dropped traffic etc)
cc @andrewn @hphilipps @AnthonySandoval
Edited by Anna Liisa Moter