Introduce a class for safe file handling as an alternative to IO and File
🔥 Problem
In the past, we faced some Application Security issues with path traversals that allowed gitlab.yml
for example).
See
In one of the retrospectives, we discussed how we could improve things so that we can protect GitLab against those kind of vulnerabilities.
🚒 Solution
In the same way that we use Gitlab::HTTP
to make safe web requests, have a class that can make safe(r) file requests.
- Create class(es) that have the same interface as File and IO. Then add rubocop rules to prevent further introduction of direct calls to IO or File
- Add a
safe_read
method that accepts a path to read and the valid folder location, and we check it againstGitlab::Utils#check_path_traversal!
and#check_allowed_absolute_path!
- Then we go through the code slowly transitioning from
read(path)
tosafe_read(path, allowed_locations)
We could later do the same for write
methods and readlines
and all those sort of extra methods.
class Gitlab
class IO
def read(*args)
super(args)
end
def safely_read(path, allowlist, *args)
Gitlab::Utils.check_path_traversal!(path)
Gitlab::Utils.check_allowed_absolute_path!(path, allowlist)
raise if File.symlink?(path)
super(path, args)
end
end
end
module Gitlab
module ImportExport
module Json
class NdjsonReader
MAX_JSON_DOCUMENT_SIZE = 50.megabytes
attr_reader :dir_path
def initialize(dir_path)
@dir_path = dir_path
@consumed_relations = Set.new
end
def exist?
Dir.exist?(@dir_path)
end
def consume_attributes(importable_path)
# This reads from `tree/project.json`
path = file_path("#{importable_path}.json")
raise Gitlab::ImportExport::Error, 'Invalid file' if !File.exist?(path) || File.symlink?(path)
--- data = File.read(path, MAX_JSON_DOCUMENT_SIZE)
+++ data = Gitlab::IO.read(path, dir_path, MAX_JSON_DOCUMENT_SIZE)
json_decode(data)
end
😿 Downsides
🎯 Coverage is GitLab code only
This protection wouldn't apply to dependencies, which is where a lot of file handling happens
GitLab-rails:
$ git grep -E '(File.read|IO.read)' | grep -v '_spec.rb' | grep -v 'spec/' | wc -l
181
Gems used by GitLab-rails:
$ grep -R -E '(File.read|IO.read)' * | grep -v '_test.rb' | grep -v '_spec.rb' | wc -l
1500
😕 The caller might not know the allowed dirs
The allowlist might be dynamic. Who validates the dir? (E.g. in the diff example above).
We could have a deny list of paths though, to accompany (and override??...) the allowlist
/etc/*
"#{Rails.root}/config/*.yml"
🔮 Alternatives
-
https://gitlab.com/gitlab-org/gitlab/-/issues/349719+ would mean that, even if there is arbitrary file read, it can't read sensitive files. But it may prove difficult to define "sensitive" files, as the GitLab app still needs to read private repo data for example. Not as "critical" as
secrets.yml
, but still bad! So it'd be less of an alternative and more of a "do this too" - https://gitlab.com/gitlab-org/gitlab/-/issues/413766+