Skip to content

Stop following symlinks when archiving documents

What does this MR do?

This MR improve the existing implementation of the fileArchiver which allows infinite loops when archiving files with symlinks. The issue stems from how the code handles directory traversal when symlinks are present.

In, the doublestar library used is updated to v4.8.1 and the argument doublestar.WithNoFollow() is added to doublestar.FilepathGlob

Why was this MR needed?

The MR fixes symlink cycle issue that would leave doublestar.FilepathGlob hanging forever

What's the best way to test this MR?

Add the following file cycle_symlinks.sh in your repo (it will be used to create symlink cycles on the job Pod).

cycle_symlinks
#!/bin/bash
# Script to create a problematic directory structure with symlink cycles

# Create base directories
mkdir -p project/folder1/subfolder
mkdir -p project/folder2

# Create some sample files
touch project/folder1/file1.txt
touch project/folder1/subfolder/data.csv
touch project/folder2/file2.txt
touch project/folder2/report.csv

# Create problematic symlinks that form cycles
# This creates a cycle: folder1/loop -> folder2 -> folder1/subfolder/back -> folder1
ln -s ../folder2 project/folder1/loop
ln -s ../folder1/subfolder project/folder2/subfolder
ln -s ../../folder1 project/folder1/subfolder/back

# Create additional symlinks to make the structure more complex
mkdir -p project/folder3
touch project/folder3/file3.csv
ln -s ../folder3 project/folder2/another
ln -s ../folder1 project/folder3/link_to_folder1

# Create a self-referential directory 
mkdir -p project/selfreferential
ln -s . project/selfreferential/myself

# Final structure will be:
#
# project/
# ├── folder1/
# │   ├── file1.txt
# │   ├── loop -> ../folder2
# │   └── subfolder/
# │       ├── data.csv
# │       └── back -> ../../folder1
# ├── folder2/
# │   ├── file2.txt
# │   ├── report.csv
# │   ├── subfolder -> ../folder1/subfolder
# │   └── another -> ../folder3
# ├── folder3/
# │   ├── file3.csv
# │   └── link_to_folder1 -> ../folder1
# └── selfreferential/
#     └── myself -> .

echo "Created problematic directory structure with symlink cycles"
gitlab-ci
variables:
  FF_USE_POWERSHELL_PATH_RESOLVER: "true"
  FF_RETRIEVE_POD_WARNING_EVENTS: "true"
  FF_PRINT_POD_EVENTS: "true"
  FF_SCRIPT_SECTIONS: "true"
  CI_DEBUG_SERVICES: "true"
  FF_USE_FASTZIP: "false"

node_modules_tests:
  image: alpine
  script:
    - sh cycle_symlinks.sh
  cache:
    paths:
    - "**/project"
config.toml
listen_address = ":9252"
concurrent = 3
check_interval = 1
log_format = "runner"
connection_max_age = "15m0s"
shutdown_timeout = 0

[session_server]
  session_timeout = 1800

[[runners]]
  pre_get_sources_script = "git config --system --add safe.directory $CI_PROJECT_DIR"
  post_get_sources_script = "git config --local --add safe.directory $CI_PROJECT_DIR"

  name = "investigation"
  limit = 50
  url = "https://gitlab.com/"
  id = 0
  token = "glrt-REDACTED"
  token_obtained_at = 2024-09-30T14:38:04.623237Z
  executor = "kubernetes"
  environment = []
  shell = "bash"
  [runners.feature_flags]
    FF_USE_ADVANCED_POD_SPEC_CONFIGURATION = true
    FF_USE_POD_ACTIVE_DEADLINE_SECONDS = true
    FF_PRINT_POD_EVENTS = true
  [runners.kubernetes]
    host = ""
    bearer_token_overwrite_allowed = false
    image = "alpine"
    namespace = ""
    namespace_overwrite_allowed = ""
    namespace_per_job = false
    privileged = true
    node_selector_overwrite_allowed = ".*"
    node_tolerations_overwrite_allowed = ""
    pod_labels_overwrite_allowed = ""
    service_account_overwrite_allowed = ""
    pull_policy = "always"
    allowed_pull_policies = ["always", "if-not-present", "never"]
    [runners.kubernetes.pod_labels]
    [runners.kubernetes.dns_config]

When running with the latest helper image, the job will stuck. However with the MR helper image (registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev:alpine3.18-x86_64-cafc08d9, it successfully passes.

It has been tested with a user and proved functional: https://gitlab.com/gitlab-com/request-for-help/-/issues/2215#note_2501726766

You can also run the added integration test (see thread 👉🏿 !5543 (comment 2518629393))

❯ go test -tags integration -timeout=60m -run "^TestFileArchiver*" -v gitlab.com/gitlab-org/gitlab-runner/commands/helpers
=== RUN   TestFileArchiver
    file_archiver_integration_test.go:57: Creating project structure in: /Users/ratchade/projects/main-runner/commands/helpers/test-TestFileArchiver-20250523-075003.416
**/project: found 17 matching artifact files and directories 
No URL provided, cache will not be uploaded to shared cache server. Cache will be stored only locally. 
    file_archiver_integration_test.go:51: Removing temporary directory: /Users/ratchade/projects/main-runner/commands/helpers/test-TestFileArchiver-20250523-075003.416
    file_archiver_integration_test.go:53: Removing archive: /Users/ratchade/projects/main-runner/commands/helpers/test-TestFileArchiver-20250523-075003.416.zip
--- PASS: TestFileArchiver (0.02s)
PASS
ok      gitlab.com/gitlab-org/gitlab-runner/commands/helpers    0.551s

What are the relevant issue numbers?

close https://gitlab.com/gitlab-com/request-for-help/-/issues/2215

Merge request reports

Loading