AWS S3 Multipart Upload leaves incomplete parts on account
Summary
Any upload to an AWS S3 bucket using multipart upload seems to leave dangling parts on the account.
This can be verified using aws s3api
's list-multipart-uploads
command.
$ aws s3api list-multipart-uploads --bucket <gitlab-bucket>
"Uploads": [
{
"UploadId": "jzma4ZvcAneUOthK7wnrk2eNdEGbVMpxFq20UvhLhetbQQaRoDuNVLKl1BqKXmXDUZd.A6rU1UzyU8Oc6n_UY3KFHTEcUWwQAmJoEAxJnPMtHl3ca4xG9Xvh81kpocVO",
"Key": "tmp/uploads/1631026479-30347-0001-8099-45f999ebc2f89234ac777c3d618b1f76",
--> "Initiated": "2021-09-07T14:54:40.000Z",
"StorageClass": "STANDARD",
"Owner": {
"DisplayName": "gitlab-aws-master-accounts+1080140f",
"ID": "524020bee91f44b1a8902ff85eb7936e41fc100fcde26edabedc314c02793fd3"
},
"Initiator": {
"ID": "arn:aws:iam::227422707839:user/ddiniz-bd62a51c",
"DisplayName": "ddiniz-bd62a51c"
}
},
According to the aws docs:
This action lists in-progress multipart uploads. An in-progress multipart upload is a multipart upload that has been initiated using the Initiate Multipart Upload request, but has not yet been completed or aborted.
...
In progress multipart uploads incur storage costs in Amazon S3. Complete or abort an active multipart upload to remove its parts from your account.
That being said, tmp/uploads
folder is always clean, so it seems like only the upload processes themselves are hanging, while the files themselves are removed.
One customer reported seeing more than 67276 dangling multipart uploads in 3 days timespan:
% aws s3api list-multipart-uploads --bucket <gitlab-bucket> | grep -c Initiated
67276
Steps to reproduce
- Setup consolidated object storage settings for AWS S3
- Add a 10MB file to the project
-
git lfs track
the large file - GitLab should automatically use multipart uploads to store the file in the configured S3 bucket
- Run
aws s3api list-multipart-uploads --bucket <gitlab-bucket>
to see the dangling upload parts
Example Project
Only applicable to Self-Managed.
What is the current bug behavior?
Multipart upload parts are left dangling on the user's AWS account.
What is the expected correct behavior?
Multipart upload parts are cleaned up after successful and failed uploads.
Relevant logs and/or screenshots
Expand for output related to the GitLab application check
2021-09-07 17:49:18 2204793 1631026158-32703-0002-3431-4af42baf00b386e3799d87f0bed80a48
Logs grepping by key then by correlation_id:
{"client_mode":"s3","copied_bytes":2204793,"correlation_id":"01FF0BR6V61W0PEAPPGJWR2N64","is_local":false,"is_multipart":true,"is_remote":true,"level":"info","msg":"saved file","remote_id":"1631026158-32703-0002-3431-4af42baf00b386e3799d87f0bed80a48","remote_temp_object":"tmp/uploads/1631026158-32703-0002-3431-4af42baf00b386e3799d87f0bed80a48","temp_file_prefix":"artifacts.zip","time":"2021-09-07T17:49:19+03:00"}
{"client_mode":"local","copied_bytes":32603,"correlation_id":"01FF0BR6V61W0PEAPPGJWR2N64","is_local":true,"is_multipart":false,"is_remote":false,"level":"info","local_temp_path":"/tmp","msg":"saved file","remote_id":"","temp_file_prefix":"metadata.gz","time":"2021-09-07T17:49:19+03:00"}
{"content_type":"application/json","correlation_id":"01FF0BR6V61W0PEAPPGJWR2N64","duration_ms":990,"host":"","level":"info","method":"POST","msg":"access","proto":"HTTP/1.1","referrer":"","remote_addr":"127.0.0.1:0","remote_ip":"127.0.0.1","route":"^/api/v4/jobs/[0-9]+/artifacts\z","status":201,"system":"http","time":"2021-09-07T17:49:19+03:00","ttfb_ms":990,"uri":"/api/v4/jobs/4415375/artifacts?artifact_format=zip\u0026artifact_type=archive","user_agent":"gitlab-runner 14.1.0 (14-1-stable; go1.13.8; linux/amd64)","written_bytes":3}
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)
Possible fixes
Customers are encouraged to create lifecycle rules to automatically purge such orphan, incomplete multipart uploads.