Skip to content

Add validation to bulk import post params

Carla Drago requested to merge 383760-validate-input-data into master

What does this MR do and why?

This change addresses the issue of param strings not being in a valid format when post requests to the bulk imports api are received.

It adds validation to the params with custom validation classes and updates the Gitlab::Regex module with a bulk_import_namespace_path_regex method that's used by the validators against the source_full_path and destination_namespace, and a group_path_regex method that used against destination_slug and destination_name params.

Regex breakdown

We use the regex module in numerous places within the code whenever we need complex string matching.

The bulk_import_namespace_path_regex needs to ensure that any of the relevant params:

  • only include accepted characters for group slugs
  • may include the presence of a forward slash '/' only with accepted characters after it

In plain english the regex takes this form:

([a period)one or more times)
[no non-word character]
((zero or one instance of a forward slash)
 (zero or one instance of a period)
 [any acceptable alphanumeric]
 [any dash or underscore]any number of times))one or more times

In regexp syntax:

([.]?)
[^\W]
([\/]?[.]?[0-9a-z][-_]*)+

The full regex (including line start and end and option to ignore case) looks like this:

%r/^([.]?)[^\W]([\/]?[.]?[0-9a-z][-_]*)+$/i

I've included extensive test cases, but you can see more examples of the validation here

The group_path_regex works similarly, but removes the match for a forward slash '/':

%r/^[.]?[^\W]([.]?[0-9a-z][-_]*)+$/i

examples here

Steps to validate

  1. make a request with incorrect source_full_path params eg:
curl --request POST --header "PRIVATE-TOKEN: [YOUR GDK TOKEN]" "http://gdk.test:3000/api/v4/bulk_imports" \
  --header "Content-Type: application/json" \
  --data '{
    "configuration": {
      "url": "https://gitlab.com/",
      "access_token": [YOUR PRODUCTION TOKEN]
    },
    "entities": [
      {
        "source_full_path": "https://gitlab.com/carld-gl/high-on-a-hill",
        "source_type": "group_entity",
        "destination_slug": "high-on-a-hill",
        "destination_namespace": "brilliant-rainbow"
      }
    ]
  }'
  1. Observe the response:
{"error":"entities[0][source_full_path] must be a relative path and not include protocol, sub-domain, or domain information. E.g. 'source/full/path' not 'https://example.com/source/full/path'"}%
  1. Repeat with malformed destination_namespace and destination_slug params. The error response will be:
{"error":"entities[0][destination_slug] can contain only letters, digits, emojis, '_', '.', dash, or parenthesis. Must start with a letter, digit, emoji or '_', and not end with a forward slash '/'."}% 

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #383760 (closed)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #383760 (closed)

Edited by Carla Drago

Merge request reports