Look into creating a CICD pipeline for assets so that that object storage is used as an origin for the CDN
Currently on gitlab.com assets are delivered by a CDN where the origin is gitlab.com, assets are delivered to the CDN from disk on each front-end server.
This is less than ideal as it prevents us from running two versions of gitlab.com alongside without both having the assets for each version unless there is stickiness set.
There is no reason why we cannot deliver assets continuously to object storage, this issue is to come up with a high-level plan/design for how that might be possible and how it fits into the current proposal for CICD.
Tasks for Option 3 - Upload assets in the deployer pipeline
-
Create a script to fetch and upload assets to object storage using the registry image or the package -
Create a new deployment asset job that will run in parallel to migrations -
Add the asset bucket to the preprod CDN as a new origin https://docs.fastly.com/guides/integrations/google-cloud-storage -
Update haproxy so it has a configurable proxy, requests to /assetsare proxied to the asset bucket -
Set the gitlab-staging asset bucket as the origin for the staging CDN -
Configure staging so that it proxies /assetsto the asset bucket
Canary
-
Create the asset buckets in production -
Remove allow-failure from the asset job https://ops.gitlab.net/gitlab-com/gl-infra/deployer/merge_requests/94 -
Execute the change issue for canary production#835 (closed)
Production
-
Execute the change issue steps for production production#835 (closed)
Assumptions
- Assets for a GitLab deployment are ~230MB
- The package we use for GitLab is the same package we give for selfmanaged, meaning that the package also contains the assets
- We at least have two asset buckets, one for production and one for non-production environments (staging, pre, etc)
- The non-production asset buckets may be a single bucket or a per-environment bucket like (staging, pre, etc)
- We do not have any retention policies to start
Option 1 - Upload assets in the gitlab-ee pipeline
graph LR;
subgraph gitlab-ee dev.gitlab.org;
1a(compile assets) --> 1b(upload to registry);
1a --> 1d(upload assets to bucket);
end
For every pipeline that runs on gitlab-ee, we would upload assets in parallel to the existing compile assets step. This happens in parallel to the existing job that uploads assets to the registry
Pros
- Probably will not add any, or very little additional time to the pipeline since it will be done in parallel
- It means we won't have to wait for assets downstream, potentially
Cons
- Downstream, we need to be sure this step was completed before deploying to GitLab.com, how would we do that?
- To ensure it was completed, we could put it before the upload-to-registry step, though this adds time to the pipeline
- We may end up uploading more objects to object-storage than we actually need
- Assumes that all assets will go to both a staging bucket and production bucket
Option 2 - Upload assets in the omnibus-gitlab pipeline
graph LR;
subgraph omnibus-gitlab dev.gitlab.org;
2a(fetch assets from registry) --> 2b(upload assets to bucket);
2b --> 2c(build package with assets);
end
Pros
- We can be sure this job executes when building an omnibus package, so we can be sure assets are uploaded before building a package for GitLab.com
- By putting the step in the omnibus pipeline, it keeps it closer to where the package is made since assets are part of the package
Cons
- We would transfer assets to multiple buckets (or use bucket replication). In either case the production bucket would have many more assets than is necessary for GitLab.com.
Option 3 - Upload assets in the deployer pipeline
graph LR;
subgraph deployer ops.gitlab.net;
z(start deployment) --> e;
z --> g
e(migrate database) --> f(deploy to GitLab.com);
g(upload assets to bucket) --> f;
end
Pros
- It's tied to the deployment, which means that we would either fetch the assets from registry or just copy them directly from disk
- We only copy the assets to production when they are needed, right before a production deployment.
Cons
- Assets are handled completely separate from the packaging, it means not every package will have assets available
Edited by John Jarvis