2025-09-11: New traffic causing poor canary performance
New traffic causing poor canary performance (Severity 3 (Medium))
Problem: Large amounts of new traffic are scraping GitLab.com public users/projects/groups/etc. This new traffic contains requests to expensive endpoints that are most noticeable on our "gitlab-org/gitlab" project.
Impact: While the traffic is not focused on the Gitaly CNY server and its projects, that is where the impact is seen and felt most. Users experienced slow requests, increased error rates, failed CI pipeline runs, and some periods of reduced service quality. Other projects and resources seem to be less affected, most likely due to them having more overhead to soak up the requests without significant impact.
Causes: A surge in automated traffic querying expensive repository endpoints such as '/-/tree', '/-/blob', '/-/commits', and '/-/network'. While those endpoints cause us the most disruption, viewing the traffic shows that there are many less expensive requests as well. These requests, coming from many unique IPs and mainly from Brazil, caused high rates of FindCommit RPC calls, resource exhaustion, and degraded system performance. No internal changes triggered the incident.
Response strategy: We disabled and re-enabled key feature flags, disabled Gitaly transactions, and implemented a Cloudflare Javascript Challenge and targeted rate limiting rules for high-traffic paths and specific countries. Temporary logging rules were used to confirm effectiveness before enforcement. Major page faults, disk latency, and error rates have since dropped and are now stable. But the traffic still persists. This traffic could change or intensify, causing more issues.
This ticket was created to track INC-3862, by incident.io