Spike: Cloud Connector Runway service POC
We are proposing to create a dedicated edge service layer for Cloud Connector through which all traffic to GitLab-hosted features is routed. The motivation for this is laid out in !132977 (merged).
The most obvious solution is to deploy a greenfield service to Runway.
We should explore how this would look like in a simple POC, and what Runway affords us with and whether there are any larger gaps in functionality. For example, we know that Runway currently does not support stateful services.
Network architecture
Open questions
-
How do services discover each other? The codesuggestions.*
host currently points at a CF LB. We would want this traffic to remain internal to the GCP network instead. -
Cross-zonal traffic: SaaS may have to cross zones to talk to CC, which is deployed with Runway, which adds cost and latency. -
For ZeroTrust purposes, service-to-service auth should probably be secured with mTLS? How do we set this up? -
How might state be handled, e.g. a Redis instance we need to talk to?
Outcome
We successfully deployed a Golang server through Runway that performs simple routing tasks, available at https://cc-gateway-poc-9fittu.runway.gitlab.net:
curl -H"X-Gitlab-Token: $(curl -s -H'Authorization: Bearer <PAT>' -XPOST https://gitlab.com/api/v4/code_suggestions/tokens | jq -r '.access_token')" -d'
{
"prompt_version": 1,
"current_file": {
"file_name": "test.py",
"content_above_cursor": "def is_even(n: int) ->",
"content_below_cursor": ""
}
}' https://cc-gateway-poc-9fittu.runway.gitlab.net/ai/v2/completions
{"id":"id","model":{"engine":"vertex-ai","name":"code-gecko@latest","lang":"python"},"experiments":[{"name":"exp_truncate_suffix","variant":1}],"object":"text_completion","created":1698755980,"choices":[{"text":" bool:\n return n % 2 == 0","index":0,"finish_reason":"length"}]}%
Main limitations encountered:
- Logging is not currently available out the gate. This is tracked in gitlab-com/gl-infra/platform/runway/team#84 (closed). Moving logs into ELK requires custom setup currently.
- mTLS support is not currently available or requires manual, application-level integration with Vault. This is tracked in gitlab-com/gl-infra/platform/runway/team#118.
- Connected storage is not available out the gate. This is tracked in https://gitlab.com/gitlab-com/gl-infra/platform/runway/team/-/issues/62 already, which explores providing a KV store outside of but accessible from Runway services.
Other facts we gathered:
- Runway uses Google Cloud Run. Runway is essentially a configuration layer on top of Google Cloud Run that uses Terraform to deploy Docker images into Cloud Run through CI/CD pipelines.
-
How do services discover each other? There are two considerations to make:
-
GitLab <=> CC
edge:- For SaaS, which is already deployed to GCP, it is likely advantageous to not go through CF again but rather contact CC at its LB IP via e.g.
cloud.lb.gitlab.com
. - For SM, which may be deployed anywhere in the world, it should go through a CF POP instead, i.e. a
cloud.gitlab.com
DNS record pointing to Cloudflare.
- For SaaS, which is already deployed to GCP, it is likely advantageous to not go through CF again but rather contact CC at its LB IP via e.g.
-
CC <=> backends
edge: We will create new DNS records for backends like the AI gateway. For example,ai-gateway.lb.gitlab.com
would point to the GCP LB and be used by the CC service so we don't circle back through Cloudflare again. We will keepcodesuggestions.gitlab.com
pointed to CF to support existing GitLabs making requests to this host directly.
-
Onboarding experience
Snags encountered during onboarding:
- Some documentation gaps initially, which were all fixed
🙌 - Adding service to provisioner inventory:
- The CI build sometimes fails with terraform errors, e.g.: https://gitlab.com/gitlab-com/gl-infra/platform/runway/provisioner/-/merge_requests/91
- Ran against the 18 character name limit at first
- Learned this is an implementation leak from GCP
- CI/CD setup was fiddly
- Must manually update settings on service project to allow deployment project to access it via CI. First set this on the wrong project. Could this be automated through an API call?
- Ran into a permission error because that feature is only available on UItimate
- Won't be an issue with projects under
gitlab-org/*
- Won't be an issue with projects under
- Next it failed to find the Docker image published from the service CI
- This makes for a brittle integration point: Runway assumes a Docker image to exist but this integration is based on environment variables that can be mismatched and requires me to write a lot of boilerplate upfront. I think it would be far easier for developers if Runway's
ci-tasks
would pull in a CI stage that builds and publishes a Docker image for me automatically, with the integration point being the presence of aDockerfile
in my service repo.- This was due to the use of a custom CI/CD variable to expand the Docker image name+tag; this was fixed in Runway 1.14.1, which now supports that.
🙌
- This was due to the use of a custom CI/CD variable to expand the Docker image name+tag; this was fixed in Runway 1.14.1, which now supports that.
- This makes for a brittle integration point: Runway assumes a Docker image to exist but this integration is based on environment variables that can be mismatched and requires me to write a lot of boilerplate upfront. I think it would be far easier for developers if Runway's
- Other project setup:
-
runway.yml
metadata requires service owner labels, but the SSoT for this was outdated- Fixed in gitlab-com/content-sites/handbook!936 (merged)
- Is there a way to keep this in lock step with org changes?
-
- Suggestion: automate more during project creation
- For instance, use the
New project
wizard to create the CI/CD boilerplate andrunway.yml
- Allow me to choose Runway as a
Deployment target
on SaaS - Make logs available right out the gate. Inspecting logs to see what my service is doing is the first thing I want to do as a developer.
- For instance, use the
Edited by Matthias Käppler