Look into creating a CICD pipeline for assets so that that object storage is used as an origin for the CDN

Currently on gitlab.com assets are delivered by a CDN where the origin is gitlab.com, assets are delivered to the CDN from disk on each front-end server.

This is less than ideal as it prevents us from running two versions of gitlab.com alongside without both having the assets for each version unless there is stickiness set.

There is no reason why we cannot deliver assets continuously to object storage, this issue is to come up with a high-level plan/design for how that might be possible and how it fits into the current proposal for CICD.

Tasks for Option 3 - Upload assets in the deployer pipeline

  • Create a script to fetch and upload assets to object storage using the registry image or the package
  • Create a new deployment asset job that will run in parallel to migrations
  • Add the asset bucket to the preprod CDN as a new origin https://docs.fastly.com/guides/integrations/google-cloud-storage
  • Update haproxy so it has a configurable proxy, requests to /assets are proxied to the asset bucket
  • Set the gitlab-staging asset bucket as the origin for the staging CDN
  • Configure staging so that it proxies /assets to the asset bucket

Canary

Production

Assumptions

  • Assets for a GitLab deployment are ~230MB
  • The package we use for GitLab is the same package we give for selfmanaged, meaning that the package also contains the assets
  • We at least have two asset buckets, one for production and one for non-production environments (staging, pre, etc)
  • The non-production asset buckets may be a single bucket or a per-environment bucket like (staging, pre, etc)
  • We do not have any retention policies to start

Option 1 - Upload assets in the gitlab-ee pipeline

graph LR;
    subgraph gitlab-ee dev.gitlab.org;
    1a(compile assets) --> 1b(upload to registry);
    1a --> 1d(upload assets to bucket); 
    end

For every pipeline that runs on gitlab-ee, we would upload assets in parallel to the existing compile assets step. This happens in parallel to the existing job that uploads assets to the registry

Pros

  • Probably will not add any, or very little additional time to the pipeline since it will be done in parallel
  • It means we won't have to wait for assets downstream, potentially

Cons

  • Downstream, we need to be sure this step was completed before deploying to GitLab.com, how would we do that?
  • To ensure it was completed, we could put it before the upload-to-registry step, though this adds time to the pipeline
  • We may end up uploading more objects to object-storage than we actually need
  • Assumes that all assets will go to both a staging bucket and production bucket

Option 2 - Upload assets in the omnibus-gitlab pipeline

graph LR;
    subgraph omnibus-gitlab dev.gitlab.org;

    2a(fetch assets from registry) --> 2b(upload assets to bucket);
    2b --> 2c(build package with assets);
    end

Pros

  • We can be sure this job executes when building an omnibus package, so we can be sure assets are uploaded before building a package for GitLab.com
  • By putting the step in the omnibus pipeline, it keeps it closer to where the package is made since assets are part of the package

Cons

  • We would transfer assets to multiple buckets (or use bucket replication). In either case the production bucket would have many more assets than is necessary for GitLab.com.

Option 3 - Upload assets in the deployer pipeline

graph LR;
    subgraph deployer ops.gitlab.net;
    z(start deployment) --> e;
    z --> g
    e(migrate database) --> f(deploy to GitLab.com);
    g(upload assets to bucket) --> f;
    end

Pros

  • It's tied to the deployment, which means that we would either fetch the assets from registry or just copy them directly from disk
  • We only copy the assets to production when they are needed, right before a production deployment.

Cons

  • Assets are handled completely separate from the packaging, it means not every package will have assets available
Edited by John Jarvis