Spike: Cloudflare Worker POC

We are proposing to create a dedicated edge service layer for Cloud Connector through which all traffic to GitLab-hosted features is routed. The motivation for this is laid out in !132977 (merged).

One promising alternative to writing and deploying a service from scratch is to use Cloudflare Workers, a serverless solution to deploying application code that:

Is auto-scaled through Cloudflare's service infrastructure.
Supports any language that compiles to Webassembly, including Rust.
Supports various options for cloud storage including a key-value store we can use to cache data.
Supports a wide range of network protocols including WebSockets.
Built in secrets management
Supports regional deployments

We should build a POC that demonstrates all stated goals of the blueprint can be accomplished with this.

Key issues to explore

What latency overhead do we need to expect?
Can we implement logic like cryptographic verification?
Can we handle requests at the TCP stream level?
Cost forecast based on current and expected traffic
Which storage options are available to store and enforce request budgets and what would they cost
How does telemetry work and how would it integrate with our dashboards and alerts
Can we do mTLS between the worker and a GCP LB or something similar?

Outcome

We wrote a POC that demonstrates how a CC smart router could look like in https://gitlab.com/mkaeppler/cloud-connector-cloudflare-worker-poc. The POC:

Reads a JWT from an HTTP header, decodes and verifies it, and renders 401 unless successful
Parses the request URI and maps any requests to /ai/* to the AI gateway

You can invoke this worker as follows:

curl -v -H"X-Gitlab-Token: $(curl -s -H'Authorization: Bearer <PAT>' -XPOST https://gitlab.com/api/v4/code_suggestions/tokens | jq -r '.access_token')" -d'
{
    "prompt_version": 1,
    "current_file": {
      "file_name": "test.py",
      "content_above_cursor": "def is_even(n: int) ->",
      "content_below_cursor": ""
    }
  }' https://cc-gateway.mkaeppler.workers.dev/ai/v2/completions

We found the following pros and cons with this approach:

Pros

Very easy and fast to stand something up that works
Very easy to run and debug worker locally (tooling is great)
Supports any implementation language that compiles to WASM
Provides several cloud storage options that would cover our needs
Supports smart placement logic that executes worker closest to where backends operate
Attractive pricing model; default pricing model does not charge for wall time

Cons

The V8 based runtime has limitations in the APIs and hence libraries it supports
Workers have numerous limits such as capped memory and CPU use, beyond which requests get discarded
Logging and telemetry need to either stay within the Cloudflare ecosystem (web dashboard or CLI), or require additional operational complexity to integrate with our existing stacks (Prometheus/ELK)
Opting for a 3rd party solution means we aren't dog-fooding Runway
No OOTB solution for staging or canary environments

Edited Oct 17, 2023 by Matthias Käppler