Exporting to S3 via API never returns Finished status
Summary
When using the export API to export a project to an S3 bucket, the export_status
never returns finished
as one would expect when the export has completed. Instead, the only export_status
returned when using this method is none
which can be counterintuitive and limiting if programmatically exporting multiple projects using a script. This also differs from regular API exports.
Steps to reproduce
- Create an S3 bucket and generate a pre-signed URL. In my case, I used boto3 and a python script to generate it. The results look something like this:
❯ python3 export_script_b.py --bucket-name cbledsoe-test-bucket --object-name download
curl -i --request PUT --upload-file download 'https://cbledsoe-test-bucket.s3.amazonaws.com/download?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210329T214809Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=XXX'
- Do a curl POST export as outlined by our documentation using the previously generated pre-signed URL
❯ curl --request POST --header "PRIVATE-TOKEN: <MY_PAT>" "https://gitlab.com/api/v4/projects/24702853/export" \
--data "upload[http_method]=PUT" \
--data-urlencode "upload[url]=https://cbledsoe-test-bucket.s3.amazonaws.com/download?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210329T214809Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=XXX"
{"message":"202 Accepted"}%
In this case, there was a 202 accepted and the file was uploaded as expected a few seconds later:
- Immediately check the status of the export:
❯ curl -s --header "PRIVATE-TOKEN: <REDACTED>" https://gitlab.com/api/v4/projects/24702853/export | jq
{
"id": 24702853,
"description": "",
"name": "cbledsoe-small-project",
"name_with_namespace": "GitLab.com - Gold / cbledsoe-test / cbledsoe-small-project",
"path": "cbledsoe-small-project",
"path_with_namespace": "gitlab-gold/cbledsoe-test/cbledsoe-small-project",
"created_at": "2021-02-25T21:04:46.477Z",
"export_status": "none"
}
Note how the export_status
is none
.
What is the current bug behavior?
When exporting to an S3 bucket via our API, the export_status
returns none
immediately after export.
What is the expected correct behavior?
When exporting to an S3 bucket via our API, the export_status
should return Finished
immediately after export.
Additional Thoughts
Note that this behavior does differ on how it works when doing regular exports:
❯ curl --request POST --header "PRIVATE-TOKEN: <REDACTED>" https://gitlab.com/api/v4/projects/24702853/export
{"message":"202 Accepted"}%
❯ curl -s --header "PRIVATE-TOKEN: <REDACTED>" https://gitlab.com/api/v4/projects/24702853/export | jq
{
"id": 24702853,
"description": "",
"name": "cbledsoe-small-project",
"name_with_namespace": "GitLab.com - Gold / cbledsoe-test / cbledsoe-small-project",
"path": "cbledsoe-small-project",
"path_with_namespace": "gitlab-gold/cbledsoe-test/cbledsoe-small-project",
"created_at": "2021-02-25T21:04:46.477Z",
"export_status": "finished",
"_links": {
"api_url": "https://gitlab.com/api/v4/projects/24702853/export/download",
"web_url": "https://gitlab.com/gitlab-gold/cbledsoe-test/cbledsoe-small-project/download_export"
}
}
Output of checks
This happens on GitLab.com 13.11.0-pre a3f68244
ZD Tickets
Proposed solution
The method that returns export status returns none
as the last resort.
The web upload strategy deletes the export file after it's done uploading the file to the destination. Since the export file no longer exists, the export status is set to none
.
To fix this, we should just leave (not remove) the export file, which would follow the same logic we have for the other export.