Skip to content

Import from amazon S3

Problem

When importing from a remote object storage, Gitlab validates content-length and content-type before starting the import to give feedback sooner to the users if the import will succeed. The validation happens using a HTTP HEAD request to the given URL to validate the content-length and content-type headers of the file.

But, Amazon S3 presigned URLs only respond to one HTTP verb, by default GET, which always returns the full file in the response. To avoid downloading big files to validate them, for now when the import comes from Amazon S3 presigned URLs we're skipping the validation in the early stages of the import !75170 (comment 748059103).

Proposed solution

Create a new endpoint like POST /projects/aws-s3-import where the user can pass S3 specific information required to retrive the file:

  1. access_key_id
  2. secret_access_key
  3. bucket_name
  4. file_key

Then, using the AWS-S3 gem (https://github.com/aws/aws-sdk-ruby), which is already in GitLab, we could validate the file and create the URL to be saved in the database. Something like:

s3_client = Aws::S3::Client.new(access_key_id: params[:access_key_id], secret_access_key: params[:secret_access_key])
file = Aws::S3::Object.new(params[:bucket_name], params[:file_key], client: s3_client)
file.content_length # => retrieves the file size
file.content_type   # => retrieves the file type
file.presigned_url(:get, expires_in: 2.days.seconds.to_i) # => creates the presigned URL to be saved and used to do the import
original discussion The following discussion from !75170 (merged) should be addressed: