Import from amazon S3
Problem
When importing from a remote object storage, Gitlab validates content-length
and content-type
before starting the import to give feedback sooner to the users if the import will succeed. The validation happens using a HTTP HEAD
request to the given URL to validate the content-length
and content-type
headers of the file.
But, Amazon S3 presigned URLs only respond to one HTTP verb, by default GET
, which always returns the full file in the response. To avoid downloading big files to validate them, for now when the import comes from Amazon S3 presigned URLs we're skipping the validation in the early stages of the import !75170 (comment 748059103).
Proposed solution
Create a new endpoint like POST /projects/aws-s3-import
where the user can pass S3 specific information required to retrive the file:
- access_key_id
- secret_access_key
- bucket_name
- file_key
Then, using the AWS-S3 gem (https://github.com/aws/aws-sdk-ruby), which is already in GitLab, we could validate the file and create the URL to be saved in the database. Something like:
s3_client = Aws::S3::Client.new(access_key_id: params[:access_key_id], secret_access_key: params[:secret_access_key])
file = Aws::S3::Object.new(params[:bucket_name], params[:file_key], client: s3_client)
file.content_length # => retrieves the file size
file.content_type # => retrieves the file type
file.presigned_url(:get, expires_in: 2.days.seconds.to_i) # => creates the presigned URL to be saved and used to do the import
original discussion
The following discussion from !75170 (merged) should be addressed:-
@reprazent started a discussion: (+11 comments) Wouldn't changing this result in the body of the response containing the entire archive? I don't think that's something we'd want to do from the web request that creates the project, right?
Would it perhaps be better to check one of the other
x-amz
headers' presence and skip thecontent-length
check in those cases? https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadObject.html#API_HeadObject_ResponseSyntaxWe'd have to handle the error when we do download the file in Sidekiq and there turns out to be no content.