Replace own cache servers with GCS for GCP Runners fleet
Since we've moved our Runners fleet to GCP, I thought we should consider switching from our own cache servers to Google Cloud Storage.
Some arguments why to do this:
-
Costs saving
I've done simple calculation using the GCP cost calculator and at this moment for GCP cache machines we're using:
- 2 x
n1-standard-2
machines: $97.10/month - 2 x
4096GB
disks for data: $327.68/month
Which in total gives us $424.78/month for providing cache for our jobs in both GCP regions that we're using.
For a
8192 GB
data stored in GCS we would pay $163.84/month which is ~38.5% of current costs. But looking on historical data for filesystem usage on cache machines, after we've restored the cleaning mechanism we were using no more than ~1.2TB per machine. And for ~2500GB in GCS we would pay $50/month which is ~11.7% of current costs!These are not big values, but cutting the costs by ~89% is something that we should definitely consider!
- 2 x
-
Stability
In the past we had many problems with our cache machines stability. And however since few months they are working well, without any issues, different problems may hit us again. I'd risk a statement, that for a longer period GCS service will be much more reliable that our own machine with minio.
-
Dogfooding of our product
People are asking for adding the GCS support for the distributed cache feature of GitLab Runner. Deciding to switch from own machines to GCS would force us to make this happen and ensure that it works well.
To make this happen, we need to:
-
add support for GCS into Runner: gitlab-org/gitlab-runner#1773 (closed) -
create a dedicated bucket in gitlab-ci
project -
check again documentation and calculate billing estimates: https://gitlab.com/gitlab-com/infrastructure/issues/4565#note_91582606 -
configure GCS lifecycle policy that would remove objects not used for last 14 days (something that we're doing now on our own machines) -
update configuration of our Runners fleet and point GCP Runners to the GCS bucket -
prm
runners -
gsrm
runners -
stg-srm
runners -
srm
runners
-
-
remove unused machines -
runners-cache-3.gitlab.com
-
runners-cache-4.gitlab.com
-
-
remove related chef configuration: -
nodes/runners-cache-3.gitlab.com.json
-
nodes/runners-cache-4.gitlab.com.json
-
roles/runners-cache-3-gitlab-com.json
-
roles/runners-cache-4-gitlab-com.json
-
runners-cache-3-gitlab-com
vault -
runners-cache-4-gitlab-com
vault
-
-
oncall disable auto-renew of TLS certificates (if needed): -
runners-cache-3.gitlab.com
-
runners-cache-4.gitlab.com
-
-
oncall remove DNS entries: -
runners-cache-3.gitlab.com
-
runners-cache-4.gitlab.com
-