Spike: Cloudflare Worker POC
We are proposing to create a dedicated edge service layer for Cloud Connector through which all traffic to GitLab-hosted features is routed. The motivation for this is laid out in !132977 (merged).
One promising alternative to writing and deploying a service from scratch is to use Cloudflare Workers, a serverless solution to deploying application code that:
- Is auto-scaled through Cloudflare's service infrastructure.
- Supports any language that compiles to Webassembly, including Rust.
- Supports various options for cloud storage including a key-value store we can use to cache data.
- Supports a wide range of network protocols including WebSockets.
- Built in secrets management
- Supports regional deployments
We should build a POC that demonstrates all stated goals of the blueprint can be accomplished with this.
Key issues to explore
-
What latency overhead do we need to expect? -
Can we implement logic like cryptographic verification? -
Can we handle requests at the TCP stream level? -
Cost forecast based on current and expected traffic -
Which storage options are available to store and enforce request budgets and what would they cost -
How does telemetry work and how would it integrate with our dashboards and alerts -
Can we do mTLS between the worker and a GCP LB or something similar?
Outcome
We wrote a POC that demonstrates how a CC smart router could look like in https://gitlab.com/mkaeppler/cloud-connector-cloudflare-worker-poc. The POC:
- Reads a JWT from an HTTP header, decodes and verifies it, and renders 401 unless successful
- Parses the request URI and maps any requests to
/ai/*
to the AI gateway
You can invoke this worker as follows:
curl -v -H"X-Gitlab-Token: $(curl -s -H'Authorization: Bearer <PAT>' -XPOST https://gitlab.com/api/v4/code_suggestions/tokens | jq -r '.access_token')" -d'
{
"prompt_version": 1,
"current_file": {
"file_name": "test.py",
"content_above_cursor": "def is_even(n: int) ->",
"content_below_cursor": ""
}
}' https://cc-gateway.mkaeppler.workers.dev/ai/v2/completions
We found the following pros and cons with this approach:
Pros
- Very easy and fast to stand something up that works
- Very easy to run and debug worker locally (tooling is great)
- Supports any implementation language that compiles to WASM
- Provides several cloud storage options that would cover our needs
- Supports smart placement logic that executes worker closest to where backends operate
- Attractive pricing model; default pricing model does not charge for wall time
Cons
- The V8 based runtime has limitations in the APIs and hence libraries it supports
- Workers have numerous limits such as capped memory and CPU use, beyond which requests get discarded
- Logging and telemetry need to either stay within the Cloudflare ecosystem (web dashboard or CLI), or require additional operational complexity to integrate with our existing stacks (Prometheus/ELK)
- Opting for a 3rd party solution means we aren't dog-fooding Runway
- No OOTB solution for staging or canary environments
Edited by Matthias Käppler