Delete old object versions on the GCS container registry bucket
## Context
While creating a new registry bucket in pre-prod (https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/12763), @hphilipps [noticed](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/12763#note_536141753) that [Object Versioning](https://cloud.google.com/storage/docs/object-versioning) was turned **on** for the registry bucket in all environments.
## Problem
This finding can have a major storage space and cost implication.
Due to how the registry operates, during a `docker push`, blobs are first uploaded to a temporary location under `/docker/registry/v2/repositories/<repository path>/_uploads`, where multipart upload chunks are staged and then validated. When the upload completes, blobs are moved to their final destination, the shared common storage, at `/docker/registry/v2/blobs`. As noted in https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/12763#note_536141753, this move operation is composed of two GCS operations, a copy from the source to the destination location followed by a delete in the source location (as [recommended by Google](https://cloud.google.com/storage/docs/copying-renaming-moving-objects#storage-move-object-go)).
After reading through the [Object Versioning docs](https://cloud.google.com/storage/docs/object-versioning) and experimenting with the registry, I found that with object versioning enabled, deleting the temporary blob upload object does not affect the bucket size.
### How to Reproduce
I've created two US multi-region buckets in the GCP `gitlab-internal` project, one with versioning disabled and another with versioning enabled (these are still there in case anyone wants to check them):
```
$ gsutil mb gs://registry-versioning-test-off
$ gsutil versioning set off gs://registry-versioning-test-off
$ gsutil mb gs://registry-versioning-test-on
$ gsutil versioning set on gs://registry-versioning-test-on
```
Confirming everything is OK:
```
$ gsutil ls -Lb gs://registry-versioning-test-off
gs://registry-versioning-test-off/ :
Storage class: STANDARD
Location type: multi-region
Location constraint: US
Versioning enabled: False
...
$ gsutil ls -Lb gs://registry-versioning-test-on
gs://registry-versioning-test-on/ :
Storage class: STANDARD
Location type: multi-region
Location constraint: US
Versioning enabled: True
...
```
Build an image from scratch, using a single random 512MiB (uncompressed) layer:
```
$ openssl rand -out layer -base64 396458519;
$ du -h layer
512M layer
$
$ cat Dockerfile
FROM scratch
ADD layer /
```
Build and push image to a registry configured with `registry-versioning-test-off` as storage backend:
```
$ docker build -t 0.0.0.0:5000/repo-a:latest .
$ docker push 0.0.0.0:5000/repo-a:latest
```
Check `registry-versioning-test-off` contents and size (only listing leaf objects here):
```
$ gsutil du -ach gs://registry-versioning-test-off
443 B gs://registry-versioning-test-off/docker/registry/v2/blobs/sha256/63/63e7b328289eb69a08438a7360768a5820577cad16f584727c1e7ec5fe2e2dad/data#1616584699041380
387.4 MiB gs://registry-versioning-test-off/docker/registry/v2/blobs/sha256/cf/cfb9f0ca157aa874badee3485648f5f29cda741e2c98c83994735965f1cd1fcc/data#1616584689687269
529 B gs://registry-versioning-test-off/docker/registry/v2/blobs/sha256/f2/f2cd18d7a16d9da638bd1f1a16563d490b7d73419043e9777e9bda1b6ee5990f/data#1616584703674179
71 B gs://registry-versioning-test-off/docker/registry/v2/repositories/repo-a/_layers/sha256/63e7b328289eb69a08438a7360768a5820577cad16f584727c1e7ec5fe2e2dad/link#1616584699673762
71 B gs://registry-versioning-test-off/docker/registry/v2/repositories/repo-a/_layers/sha256/cfb9f0ca157aa874badee3485648f5f29cda741e2c98c83994735965f1cd1fcc/link#1616584690416832
71 B gs://registry-versioning-test-off/docker/registry/v2/repositories/repo-a/_manifests/revisions/sha256/f2cd18d7a16d9da638bd1f1a16563d490b7d73419043e9777e9bda1b6ee5990f/link#1616584703977008
71 B gs://registry-versioning-test-off/docker/registry/v2/repositories/repo-a/_manifests/tags/latest/current/link#1616584704757836
71 B gs://registry-versioning-test-off/docker/registry/v2/repositories/repo-a/_manifests/tags/latest/index/sha256/f2cd18d7a16d9da638bd1f1a16563d490b7d73419043e9777e9bda1b6ee5990f/link#1616584704374060
387.4 MiB total
```
**Note:** the `-a` flag includes non-current object versions for a bucket with Object Versioning enabled (if any).
Nothing unexpected here, there is nothing under `docker/registry/v2/repositories/repo-a/_uploads/`, as the temporary upload artifacts were _permanently_ deleted. The compressed image layer was stored at `docker/registry/v2/blobs/sha256/cf/cfb9f0ca157aa874badee3485648f5f29cda741e2c98c83994735965f1cd1fcc/data`, with 387.4 MiB in size. The remaining listed objects (metadata) are irrelevant for this issue.
Repeat `docker push` with registry configured to use `registry-versioning-test-on` as storage backend, then list the bucket contents and size:
```
$ gsutil du -ach gs://registry-versioning-test-on
443 B gs://registry-versioning-test-on/docker/registry/v2/blobs/sha256/63/63e7b328289eb69a08438a7360768a5820577cad16f584727c1e7ec5fe2e2dad/data#1616585017538101
387.4 MiB gs://registry-versioning-test-on/docker/registry/v2/blobs/sha256/cf/cfb9f0ca157aa874badee3485648f5f29cda741e2c98c83994735965f1cd1fcc/data#1616585007728942
529 B gs://registry-versioning-test-on/docker/registry/v2/blobs/sha256/f2/f2cd18d7a16d9da638bd1f1a16563d490b7d73419043e9777e9bda1b6ee5990f/data#1616585022506818
71 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_layers/sha256/63e7b328289eb69a08438a7360768a5820577cad16f584727c1e7ec5fe2e2dad/link#1616585018147353
71 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_layers/sha256/cfb9f0ca157aa874badee3485648f5f29cda741e2c98c83994735965f1cd1fcc/link#1616585008519894
71 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_manifests/revisions/sha256/f2cd18d7a16d9da638bd1f1a16563d490b7d73419043e9777e9bda1b6ee5990f/link#1616585022784041
71 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_manifests/tags/latest/current/link#1616585023381882
71 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_manifests/tags/latest/index/sha256/f2cd18d7a16d9da638bd1f1a16563d490b7d73419043e9777e9bda1b6ee5990f/link#1616585023083321
0 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/9e6d43d0-f0db-48a5-9103-73bfc109f880/data#1616585012583258
443 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/9e6d43d0-f0db-48a5-9103-73bfc109f880/data#1616585013898030
443 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/9e6d43d0-f0db-48a5-9103-73bfc109f880/data#1616585015367362
20 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/9e6d43d0-f0db-48a5-9103-73bfc109f880/startedat#1616585011918776
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/9e6d43d0-f0db-48a5-9103-73bfc109f880/hashstates/sha256/0#1616585012275504
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/9e6d43d0-f0db-48a5-9103-73bfc109f880/hashstates/sha256/0#1616585015725178
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/9e6d43d0-f0db-48a5-9103-73bfc109f880/hashstates/sha256/443#1616585013601098
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/9e6d43d0-f0db-48a5-9103-73bfc109f880/hashstates/sha256/443#1616585014298392
0 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/db0a6061-ddae-461d-b8a4-0bd327dbbdda/data#1616584943212093
152.73 KiB gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/db0a6061-ddae-461d-b8a4-0bd327dbbdda/data#1616585004017072
387.4 MiB gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/db0a6061-ddae-461d-b8a4-0bd327dbbdda/data#1616585005620154
20 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/db0a6061-ddae-461d-b8a4-0bd327dbbdda/startedat#1616584942599367
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/db0a6061-ddae-461d-b8a4-0bd327dbbdda/hashstates/sha256/0#1616584942936298
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/db0a6061-ddae-461d-b8a4-0bd327dbbdda/hashstates/sha256/0#1616585006116258
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/db0a6061-ddae-461d-b8a4-0bd327dbbdda/hashstates/sha256/406217456#1616585002931778
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/db0a6061-ddae-461d-b8a4-0bd327dbbdda/hashstates/sha256/406217456#1616585004377840
774.95 MiB total
```
Note that here we can see the `docker/registry/v2/repositories/repo-a/_uploads/` artifacts, as object versioning is enabled, so they still count towards the bucket size despite being deleted. Therefore, the bucket size is 774.95 MiB and not 387.4 MiB.
With another test, if we push the same image to a different repository:
```
$ docker tag 0.0.0.0:5000/repo-a:latest 0.0.0.0:5000/repo-b:latest
$ docker push 0.0.0.0:5000/repo-b:latest
```
```
$ gsutil du -ach gs://registry-versioning-test-on
443 B gs://registry-versioning-test-on/docker/registry/v2/blobs/sha256/63/63e7b328289eb69a08438a7360768a5820577cad16f584727c1e7ec5fe2e2dad/data#1616585017538101
387.4 MiB gs://registry-versioning-test-on/docker/registry/v2/blobs/sha256/cf/cfb9f0ca157aa874badee3485648f5f29cda741e2c98c83994735965f1cd1fcc/data#1616585007728942
529 B gs://registry-versioning-test-on/docker/registry/v2/blobs/sha256/f2/f2cd18d7a16d9da638bd1f1a16563d490b7d73419043e9777e9bda1b6ee5990f/data#1616585022506818
71 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_layers/sha256/63e7b328289eb69a08438a7360768a5820577cad16f584727c1e7ec5fe2e2dad/link#1616585018147353
71 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_layers/sha256/cfb9f0ca157aa874badee3485648f5f29cda741e2c98c83994735965f1cd1fcc/link#1616585008519894
71 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_manifests/revisions/sha256/f2cd18d7a16d9da638bd1f1a16563d490b7d73419043e9777e9bda1b6ee5990f/link#1616585022784041
71 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_manifests/tags/latest/current/link#1616585023381882
71 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_manifests/tags/latest/index/sha256/f2cd18d7a16d9da638bd1f1a16563d490b7d73419043e9777e9bda1b6ee5990f/link#1616585023083321
71 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_manifests/tags/latest/index/sha256/f2cd18d7a16d9da638bd1f1a16563d490b7d73419043e9777e9bda1b6ee5990f/
0 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/9e6d43d0-f0db-48a5-9103-73bfc109f880/data#1616585012583258
443 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/9e6d43d0-f0db-48a5-9103-73bfc109f880/data#1616585013898030
443 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/9e6d43d0-f0db-48a5-9103-73bfc109f880/data#1616585015367362
20 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/9e6d43d0-f0db-48a5-9103-73bfc109f880/startedat#1616585011918776
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/9e6d43d0-f0db-48a5-9103-73bfc109f880/hashstates/sha256/0#1616585012275504
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/9e6d43d0-f0db-48a5-9103-73bfc109f880/hashstates/sha256/0#1616585015725178
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/9e6d43d0-f0db-48a5-9103-73bfc109f880/hashstates/sha256/443#1616585013601098
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/9e6d43d0-f0db-48a5-9103-73bfc109f880/hashstates/sha256/443#1616585014298392
0 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/db0a6061-ddae-461d-b8a4-0bd327dbbdda/data#1616584943212093
152.73 KiB gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/db0a6061-ddae-461d-b8a4-0bd327dbbdda/data#1616585004017072
387.4 MiB gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/db0a6061-ddae-461d-b8a4-0bd327dbbdda/data#1616585005620154
20 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/db0a6061-ddae-461d-b8a4-0bd327dbbdda/startedat#1616584942599367
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/db0a6061-ddae-461d-b8a4-0bd327dbbdda/hashstates/sha256/0#1616584942936298
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/db0a6061-ddae-461d-b8a4-0bd327dbbdda/hashstates/sha256/0#1616585006116258
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/db0a6061-ddae-461d-b8a4-0bd327dbbdda/hashstates/sha256/406217456#1616585002931778
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-a/_uploads/db0a6061-ddae-461d-b8a4-0bd327dbbdda/hashstates/sha256/406217456#1616585004377840
71 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_layers/sha256/63e7b328289eb69a08438a7360768a5820577cad16f584727c1e7ec5fe2e2dad/link#1616585401952747
71 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_layers/sha256/cfb9f0ca157aa874badee3485648f5f29cda741e2c98c83994735965f1cd1fcc/link#1616585394126589
71 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_manifests/revisions/sha256/f2cd18d7a16d9da638bd1f1a16563d490b7d73419043e9777e9bda1b6ee5990f/link#1616585406319770
71 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_manifests/tags/latest/current/link#1616585406970659
71 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_manifests/tags/latest/index/sha256/f2cd18d7a16d9da638bd1f1a16563d490b7d73419043e9777e9bda1b6ee5990f/link#1616585406611819
0 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_uploads/7ef09112-e857-40af-9f05-4e6242efc460/data#1616585321825376
152.73 KiB gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_uploads/7ef09112-e857-40af-9f05-4e6242efc460/data#1616585390786295
387.4 MiB gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_uploads/7ef09112-e857-40af-9f05-4e6242efc460/data#1616585392392549
20 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_uploads/7ef09112-e857-40af-9f05-4e6242efc460/startedat#1616585321155156
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_uploads/7ef09112-e857-40af-9f05-4e6242efc460/hashstates/sha256/0#1616585321550860
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_uploads/7ef09112-e857-40af-9f05-4e6242efc460/hashstates/sha256/0#1616585392760734
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_uploads/7ef09112-e857-40af-9f05-4e6242efc460/hashstates/sha256/406217456#1616585389667660
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_uploads/7ef09112-e857-40af-9f05-4e6242efc460/hashstates/sha256/406217456#1616585391151247
0 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_uploads/ed14804d-0b97-4295-a372-36dcb7310816/data#1616585398188825
443 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_uploads/ed14804d-0b97-4295-a372-36dcb7310816/data#1616585399259976
443 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_uploads/ed14804d-0b97-4295-a372-36dcb7310816/data#1616585400483521
20 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_uploads/ed14804d-0b97-4295-a372-36dcb7310816/startedat#1616585397631682
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_uploads/ed14804d-0b97-4295-a372-36dcb7310816/hashstates/sha256/0#1616585397912117
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_uploads/ed14804d-0b97-4295-a372-36dcb7310816/hashstates/sha256/0#1616585400852581
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_uploads/ed14804d-0b97-4295-a372-36dcb7310816/hashstates/sha256/443#1616585398951063
108 B gs://registry-versioning-test-on/docker/registry/v2/repositories/repo-b/_uploads/ed14804d-0b97-4295-a372-36dcb7310816/hashstates/sha256/443#1616585399608695
1.14 GiB total
```
The bucket size is now 1.14 GiB, when the live objects are just 387.4 MiB (note the `-a` flag to include/ignore deleted versioned objects):
```
$ gsutil du -sach gs://registry-versioning-test-on
1.14 GiB gs://registry-versioning-test-on
1.14 GiB total
$ gsutil du -sch gs://registry-versioning-test-on
387.4 MiB gs://registry-versioning-test-on
387.4 MiB total
```
### Conclusions
The registry deduplicates blobs, so a single blob is stored only once in shared storage. However, due to versioning being enabled, temporary copies (local to each repository) continue to count towards the bucket size after deleted.
Considering the experiment above, we can see that if a given layer with size `S` is pushed `N` times to different repositories, the overall bucket size grows `N`*`S`, instead of just `S` or nothing (if the blob already exists in common storage).
## Expectations
If object versioning is enabled in the production bucket, and the storage metrics we collect don't ignore deleted versions, this means that we don't have ~10PiB of data as we thought so, we should have way less (possibly less than half?).
In regards to billing, "_for buckets with Object Versioning enabled, each noncurrent version of an object is charged at the same rate as the live version of the object_" ([source](https://cloud.google.com/storage/pricing#storage-notes)).
## Questions
- Can we check when was object versioning turned on for the production bucket? Has it been enabled since the beginning?
Versioning was turned on from pretty much the beginning, or 3 years ago: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/3681
- Do we have any [Object Lifecycle Management](https://cloud.google.com/storage/docs/lifecycle) policies defined for the production bucket?
No, See https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/12932#note_538017141
- Can we confirm if the bucket size metrics we collect include or ignore the non-current object versions?
Bucket Size Metrics do include non-current object version and we do pay for these: https://cloud.google.com/storage/docs/object-versioning
## Discussion/Required Steps
- Once we know when versioning was enabled, we can perhaps define a rough expectation about the percentage of live data we might actually have.
- Consider disabling object versioning. The only significant benefit I can see for versioning is to recover accidentally deleted objects. However, given how the registry operates and its stability, the cost is likely not worth the possible benefit (?).
Alternatively, I think we can explore deleting non-current versions automatically with a policy or an [`on Archive` trigger](https://firebase.google.com/docs/storage/extend-with-functions).
- Disabling versioning will stop the problem from that moment onwards but won't delete the non-current versions already created. We need to consider defining an Object Lifecycle Management policy to delete older object versions.
## Weighting - 9/10
### Cost Savings - $276,000/QTR
The cost savings is based on rough guess that 40% of current total registry storage could be cleaned up through this. From the testing we can see how non-current objects can be 2x or more of the storage of live objects, and given that these versions have been around since the beginning and were created any time an image was deleted or updated, I think it's likely a large portion of the storage is due to this, although an exact number would require more research.
### Customer Impact - <5% of Users
There should be 0 impact on customers since this is just deleting objects that are not in use. (essentially versioned backups of updated or deleted images, some of which are extremely old)
### Future Potential Cost Impact - < $276,000/QTR
Same as savings as this is opportunity cost of not doing anything
### Effort Required - 1 wk - 1mo
This would be a fairly basic config change of editing lifecycle management rules to delete non-current objects
- weight factors: https://about.gitlab.com/handbook/engineering/infrastructure/cost-management/infrafin-board/
epic