All unauthenticated web requests are rate limited following an attempt to bump the Helm chart version in gstg
What Happened
I merged the chart bump MR gitlab-com/gl-infra/k8s-workloads/gitlab-com!4553 (merged) containing the changes gitlab-org/charts/gitlab@779f351e...9ce01717. The diff for this chart bump MR contained a GitLab Helm chart MR introducing IPv6 support gitlab-org/charts/gitlab!4072 (merged). The migration to IPv6 had already caused #21298 (closed), because Puma started binding on tcp://[::]:8080 rather than tcp://0.0.0.0:8080.
Once the new Chart version was deployed to all clusters in gstg, unauthenticated web requests started returning a 429 (Rate limit exceeded) error, and the login page was not displayed properly. The problem was resolved by reverting the chart bump. (Incident: production#20121 (closed))
While this version of the chart was deployed to staging, we saw this instead of the sign-in page:
Notably, logged in users were still able to use staging.
Logs during the Incident
Logs from rails-inf-gstg showed that the error was coming from the section of Rails where we throttle requests using ApplicationSettings: throttle.rb
All the errors were coming from the throttle_unauthenticated_web section of the throttling logic.
Interestingly, the remote.ip for all these log lines was ::ffff:127.0.0.1. This was the reason that the throttling logic kept triggering, even though we were sending requests from different IP addresses (different CI runners, various team members' laptops). This seems like the crux of the issue here.
For verification, I looked for older logs from this section of the code and confirmed that those had the true user IP address in their remote.ip field:
This problem did not happen in Preprod. So, once again, there is something related to our setup in gstg which is causing this issue.
Exit Criteria
We will collaborate with #g_distribution on this issue, providing help on GitLab.com-specific setup questions.
-
Find the reason that remote.ip = 127.0.0.1 -
Figure out why this happens only in gstg -
Fix the root cause in the upstream GitLab Rails codebase or the GitLab Helm chart


