Minimize customer permit-list configurations

Context

Cloud Connector features require an always-on internet connection. Currently, customers are required to permit two outbound connections: one to customers Dot (for cloud licensing), and a second one to Code Suggestions (feature specific endpoint). As we introduce new cloud connector features, we should be mindful of introducing additional endpoints that customers need to explicitly permit.

Product Vision

Our goal is to minimize customer friction associated with enabling GitLab Cloud Connector features. Ideally, customers should be able to access all cloud connector features by permitting only one single endpoint. We want to avoid a situation where each new cloud connector feature requires additional network configuration by our customers.

Open question: Does it make sense to wrap everything (CDot and features) into one endpoint? Or should we aim for two (one for CDot and one for Cloud Connector)?

Planning ahead

Status quo (now): 1 AI feature served

The only released feature is Code Suggestions, a stateless AI feature served via the AI gateway. This system is currently reachable at codesuggestions.gitlab.com as this was the first and only feature it implemented at the time of launch.

Short term (next 3-6 months): N AI features served

Our current direction as communicated via the AI architecture is to also serve future AI features from the AI gateway. Our short term goal should be to rename this host to be sufficiently generic to cover other AI features served from the AI gateway. This could still be done using a host name that references AI in general (e.g.: ai.gitlab.com) but it must not be specific to one feature anymore.

Long term (next 6-12 months): non-AI features

Finally, Cloud Connector will not be limited to AI features in the long term. This furthermore raises the question if or how the AI gateway can or should evolve into a more general web service, which we are exploring separately in this issue. Regardless of whether a dedicated service will host Cloud Connector features or not, from a customer perspective they must appear as if they were. Our long term goal should be to provide a single entry point into all Cloud Connector features, even if not AI related. This could be done via a new cloud.gitlab.com endpoint.

Technical proposal

Option 1: New DNS record for AI gateway

As there is significant uncertainty around how the AI gateway will evolve in face of the above listed challenges, we can address the immediate problem of removing configuration friction by creating additional DNS CNAME records that point to the AI gateway.

  • Create ai.cloud.gitlab.com as DNS A record pointing to AI gateway IPs
  • Replace codesuggestions.gitlab.com with CNAME record pointing to ai.cloud.gitlab.com
  • Update gitlab-rails to talk to ai.cloud.gitlab.com

Benefits:

  • Backwards compatible with existing release: SM instances configured for contacting codesuggestions. will continue to work
  • Forwards compatible with stateless AI features: new AI features made available via the AI gateway remain accessible
  • Leaves door open for future evolution: features that may need to be served from a web service that is not the AI gateway could be made available at theservice.cloud.gitlab.com. Using wildcard firewall or proxy config, admins only need to allow traffic for .cloud.gitlab.com to be able to reach any of these network endpoints, including a potential universal cloud.gitlab.com "Cloud Connector gateway" service

Drawbacks:

  • For HTTP proxies, this would either require customers to use wildcards such as *.cloud.gitlab.com, which we know makes some customers uneasy, or require them to list each host individually, which contradicts the problem we are looking to solve here.

Option 2: New universal host name + path based routing

Alternatively, we could already switch to a generic host name such as cloud.gitlab.com and map traffic via URL path components instead, for example:

  • cloud.gitlab.com/ai/<completions> => AI gateway completions endpoint
  • cloud.gitlab.com/ai/<chat> => AI gateway chat endpoint
  • cloud.gitlab.com/cicd/<manage_runners> => Hypothetical CI/CD runner management service endpoint

The GitLab Rails app would then only ever contact cloud.gitlab.com and routing in Runway would decide where to send this traffic.

Benefits:

  • All of the benefits from the previous solution
  • A forward-compatible single entry point for all current and future Cloud Connector features
  • Leaves the door open to installing a dedicated Cloud Connector gateway service that could then be hosted on this domain
  • No wildcard config necessary for proxies since all traffic goes through a single domain

Drawbacks:

Unknown at this time -- still exploring this option.

Firewall config

A related configuration problem that cannot be solved via DNS or routing patterns is that of minimizing firewall settings. Since these rules are based on IPs or IP ranges, the task is to minimize the set of IPs or IP ranges we ask customers to open for traffic.

Some considerations:

  • We find that some customers prefer to configure a set of static IPs; this might incur extra cost since we need to pay GCP for these.
  • We already provide IP ranges for customers.*; we should see if we can extend this to cover all current and future Runway services too, in which case we can solve this problem via a documentation update.

Outcome

Our team decided on the following actions:

  • We will continue to use codesuggestions. for the time being for all AI features including chat. This has no functional disadvantages. Sticking with this DNS record for a while longer means we don't need to ask customers to update their network settings for as long as Cloud Connector only serves AI features.
  • To address this issue properly, we will instead increase our efforts in standing up a dedicated Cloud Connector gateway service to be reachable at cloud.gitlab.com. We are already spiking different approaches to this and are aiming to deploy this in time before the first non-AI features arrives, at which point we will ask customers once to update their host permit lists and firewall settings to allow traffic to cloud. instead of codesuggestions..
Edited by Matthias Käppler