Provide easy way to extract raw git repository from Gitaly server
Overview
Sometimes the engineers from the Gitaly team need an extra of git repository so that they can debug it, and do performance tests locally. Especially when it comes to large repositories like gitlab-org/gitlab
or gitlab-com/www-gitlab-com
.
At the moment we have to go through the following dance
# Check file size
root@file-praefect-01-stor-gprd.c.gitlab-production.internal:/var/opt/gitlab/git-data/repositories# du -sh @hashed/fa/53/fa539965395b8382145f8370b34eab249cf610d2d6f2943c95b9b9d08a63d4a3.git
902M @hashed/fa/53/fa539965395b8382145f8370b34eab249cf610d2d6f2943c95b9b9d08a63d4a3.git
root@file-praefect-01-stor-gprd.c.gitlab-production.internal:/var/opt/gitlab/git-data/repositories# du -sh @pools/ec/c1/ecc16e5a1ae6ebb3354ba562e78a68729822b8caf4c221985df72ddc68ed0880.git
10G @pools/ec/c1/ecc16e5a1ae6ebb3354ba562e78a68729822b8caf4c221985df72ddc68ed0880.git
# Create tar
root@file-praefect-01-stor-gprd.c.gitlab-production.internal:/var/opt/gitlab/git-data/repositories# tar -cf gitlab-com-www-gitlab-com.tar @hashed/fa/53/fa539965395b8382145f8370b34eab249cf610d2d6f2943c95b9b9d08a63d4a3.git @pools/ec/c1/ecc16e5a1ae6ebb3354ba562e78a68729822b8caf4c221985df72ddc68ed0880.git
tar: @pools/ec/c1/ecc16e5a1ae6ebb3354ba562e78a68729822b8caf4c221985df72ddc68ed0880.git/objects/pack/pack-7baac1f26735592cf3a42a1f9d481db1bad1a71d.pack: file changed as we read it
# Copy secrets file stored locally to machine, since the service account attached to the machine doesn't have scope to access GCS.
~ scp credentials-steveazz-gitlab-gprd-tmp.json file-praefect-01-stor-gprd.c.gitlab-production.internal:/home/steve/
# Copy to GCS bucket
root@file-praefect-01-stor-gprd.c.gitlab-production.internal:/var/opt/gitlab/git-data/repositories# gcloud auth activate-service-account --key-file /home/steve/credentials-steveazz-gitlab-gprd-tmp.json
root@file-praefect-01-stor-gprd.c.gitlab-production.internal:/var/opt/gitlab/git-data/repositories# gsutil cp gitlab-com-www-gitlab-com.tar gs://gitlab-gprd-tmp/reliability-16828/gitlab-com-ww-gitlab-com.tar
# Create Signed URL for engineer
~ gsutil signurl credentials-steveazz-gitlab-gprd-tmp.json gs://gitlab-gprd-tmp/reliability-16828/gitlab-com-www-gitlab-com.tar
Past requests:
- https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16828
- https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16674
- https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/17469
- https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/17596
Proposal
Automate this with the hopes to be able to just specify the project path and it will output the a signed URL that we can share with the engineer that requests it.
There are multiple ways we can do this, one way is to use ansible
which run everything, or a bash script.
Edited by Steve Xuereb