Static Egress IP Addresses for Cloud Run Based Runway Services

Problem Statement

Currently, Runway services deployed on Cloud Run (AI Gateway, DWS, glgo, PVS, etc.) send requests from a dynamic IP address pool. This creates challenges for downstream services that need to:

Implement rate limiting while allowing legitimate Runway service traffic to pass through unrestricted
Protect against DDoS attacks by allowlisting trusted traffic sources
Secure billing-critical endpoints by restricting access to known, trusted sources only

Without static egress IPs, downstream services cannot reliably distinguish between legitimate Runway service requests and potentially malicious traffic.

Business Impact

Usage Billing Protection: Data Insights Platform (DIP) ingress needs to lockdown access to the correct sources of billing events from AI Gateway
Security Posture: WAF/Cloudflare rate limiting rules cannot effectively protect endpoints while allowing Runway services through
Operational Overhead: Current workarounds (high rate limit thresholds, request characteristic fingerprinting) are less reliable and harder to maintain

Proposed Solution

Provide static egress IP addresses for Cloud Run based Runway services:

One static IP for staging environment services
One static IP for production environment services
These IPs would be the source address for all outbound requests from: AI Gateway, DWS, glgo, PVS, and other Cloud Run based services

The IP addresses do not need to be from GitLab-owned blocks - GCP-provided static IPs are acceptable.

Implementation Approach

Based on Google Cloud documentation, this can be implemented using Cloud Router and Cloud NAT:

Reserve static external IP addresses: Create google_compute_address resources for staging and production
Use existing Terraform module: The terraform-google-modules/cloud-router/google module is already in use for GKE (see modules/runtimes/gcp_gke/network.tf)

Configure Cloud NAT with manual IP allocation:

nats = [
  {
    nat_ip_allocate_option = "MANUAL_ONLY"
    nat_ips                = [google_compute_address.static.self_link]
  }
]

Configure Cloud Run services: Set up the VPC connector to route egress traffic through the NAT gateway. This likely needs to happen in the Reconciler, i.e. in the runwayctl repository.

Timeline

Target: Mid-November 2025 to support Usage Billing launch and DIP security requirements.

Open Questions

Does this approach remain relevant given the discussion in gitlab-org/architecture/usage-billing/design-doc#11 about using NATS to decouple Snowplow and AI Gateway? If NATS is used for usage metrics ingestion, would the Snowplow endpoint still need to be publicly reachable?

Success Criteria

Downstream services (DIP, Snowplow) can configure WAF/Cloudflare rules to allowlist specific static IPs
Rate limiting can be applied to all non-allowlisted traffic without impacting Runway services
Solution is in place before Usage Billing goes live in mid-November
Implementation follows existing Runway infrastructure patterns and uses established Terraform modules

Related Issues

gitlab-org/gitlab#571768+
gitlab-org/architecture/usage-billing/design-doc#11+

References

Google Cloud: Configuring static outbound IP for Cloud Run
terraform-google-modules/cloud-router/google
Existing implementation: modules/runtimes/gcp_gke/network.tf in Runway Provisioner repository

Edited Oct 28, 2025 by Florian Forster