Proposal: Migrate most of the Packagecloud components to k8s/GCP
Summary
packagecloud:enterprise is the application that powers packages.gitlab.com, which is our distribution site for packages. It is where users can download our DEB and RPM packages for many Linux distributions.
There are a number of components that make Packagecloud tick. The majority of them (package uploading, indexing, serving) live on a SPoF VM in AWS. Over time, we have moved some services off the VM that can be served by managed services (Redis, DB), however the VM remains a SPoF. It has been the source of many incidents over time including recent ones due to the instance becoming overwhelmed during package publishing.
This issue is a proposal to migrate most of the Packagecloud components from AWS to k8s in GCP. The only components that will have to remain in AWS (due to a hard dependency within Packagecloud) are the S3 bucket and CloudFront (CDN).
What problems are we trying to solve?
The main problem we're trying to solve here is to remove the SPoF. The question is whether we invest minimum time & effort into making it work in AWS (e.g., go from a single VM to an ASG) or instead invest more time & effort but end up in a much better place.
Some of the benefits in migrating to k8s/GCP are:
- Easier to maintain; the majority of our services are hosted on k8s & GCP
- Reduces complexity for SREs by running the service using our standard model (GCP + k8s)
- We get autoscaling for free
- We wouldn't have an expensive (currently
c5a.8xlarge
) VM dedicated to packagecloud, which is normally fairly idle until package uploads happen - We remove another service from chef
- Could potentially be cheaper given our arrangement with GCP?
Some of the drawbacks:
- It will take considerably more time & effort to achieve
- Not the standard documented Packagecloud deployment method (k8s & split GCP/AWS architecture), however this has been discussed with Packagecloud Support and no red flags. Their hosted packagecloud offering runs on k8s.
Current Architecture
![](/-/project/1304532/uploads/e3e6a627074d456d5934a2289c06758d/image.png)
Proposed Architecture
![](/-/project/1304532/uploads/9c75456abadf2247eff36bef6d037386/image.png)
The majority of the cross-cloud traffic would come from package uploads & indexing. The package serving is handled by Unicorn querying the DB for the packages and returning a redirect to CloudFront so there wouldn't be any (afaik) cross-cloud traffic when serving packages.
Other options
Leaving it all as is
This would leave a fairly critical service with a SPoF.
Switch to an ASG and leave it all in AWS
That's certainly an option as mentioned in the issue description. This would be less effort than the proposed solution, however it will still leave us with a service running on dedicated VMs in AWS.
Why Packagecloud? We could do it ourselves?
We could, but that's a much bigger project as it would be a cross-team effort. Packagecloud is embedded into our tooling and used by other teams, so it would require some cross-team collaboration. It probably wouldn't be at the top of the priority list given the lack of urgency as the service works.