Static Egress IP Addresses for Cloud Run Based Runway Services
Problem Statement
Currently, Runway services deployed on Cloud Run (AI Gateway, DWS, glgo, PVS, etc.) send requests from a dynamic IP address pool. This creates challenges for downstream services that need to:
- Implement rate limiting while allowing legitimate Runway service traffic to pass through unrestricted
- Protect against DDoS attacks by allowlisting trusted traffic sources
- Secure billing-critical endpoints by restricting access to known, trusted sources only
Without static egress IPs, downstream services cannot reliably distinguish between legitimate Runway service requests and potentially malicious traffic.
Business Impact
- Usage Billing Protection: Data Insights Platform (DIP) ingress needs to lockdown access to the correct sources of billing events from AI Gateway
- Security Posture: WAF/Cloudflare rate limiting rules cannot effectively protect endpoints while allowing Runway services through
- Operational Overhead: Current workarounds (high rate limit thresholds, request characteristic fingerprinting) are less reliable and harder to maintain
Proposed Solution
Provide static egress IP addresses for Cloud Run based Runway services:
- One static IP for staging environment services
- One static IP for production environment services
- These IPs would be the source address for all outbound requests from: AI Gateway, DWS, glgo, PVS, and other Cloud Run based services
The IP addresses do not need to be from GitLab-owned blocks - GCP-provided static IPs are acceptable.
Implementation Approach
Based on Google Cloud documentation, this can be implemented using Cloud Router and Cloud NAT:
-
Reserve static external IP addresses: Create
google_compute_addressresources for staging and production -
Use existing Terraform module: The
terraform-google-modules/cloud-router/googlemodule is already in use for GKE (seemodules/runtimes/gcp_gke/network.tf) -
Configure Cloud NAT with manual IP allocation:
nats = [ { nat_ip_allocate_option = "MANUAL_ONLY" nat_ips = [google_compute_address.static.self_link] } ] -
Configure Cloud Run services: Set up the VPC connector to route egress traffic through the NAT gateway. This likely needs to happen in the Reconciler, i.e. in the
runwayctlrepository.
Timeline
Target: Mid-November 2025 to support Usage Billing launch and DIP security requirements.
Open Questions
Does this approach remain relevant given the discussion in gitlab-org/architecture/usage-billing/design-doc#11 about using NATS to decouple Snowplow and AI Gateway? If NATS is used for usage metrics ingestion, would the Snowplow endpoint still need to be publicly reachable?
Success Criteria
- Downstream services (DIP, Snowplow) can configure WAF/Cloudflare rules to allowlist specific static IPs
- Rate limiting can be applied to all non-allowlisted traffic without impacting Runway services
- Solution is in place before Usage Billing goes live in mid-November
- Implementation follows existing Runway infrastructure patterns and uses established Terraform modules
Related Issues
- gitlab-org/gitlab#571768+
- gitlab-org/architecture/usage-billing/design-doc#11+
References
- Google Cloud: Configuring static outbound IP for Cloud Run
- terraform-google-modules/cloud-router/google
- Existing implementation:
modules/runtimes/gcp_gke/network.tfin Runway Provisioner repository