Introduce a class for safe file handling as an alternative to IO and File
<!--IssueSummary start--> <details> <summary> Everyone can contribute. [Help move this issue forward](https://handbook.gitlab.com/handbook/marketing/developer-relations/contributor-success/community-contributors-workflows/#contributor-links) while earning points, leveling up and collecting rewards. </summary> - [Close this issue](https://contributors.gitlab.com/manage-issue?action=close&projectId=278964&issueIid=413916) </details> <!--IssueSummary end--> ## :fire: Problem In the past, we faced some ~"Application Security" issues with path traversals that allowed :smiling_imp: users to get the contents of an arbitrary file in the system (`gitlab.yml` for example). See * https://gitlab.com/gitlab-org/gitlab/-/issues/412371 * https://gitlab.com/gitlab-org/gitlab/-/issues/409622 In one of the retrospectives, we [discussed](https://gitlab.com/gitlab-com/gl-security/rcas/-/issues/9#note_1405310422) how we could improve things so that we can protect GitLab against those kind of vulnerabilities. ## :fire_engine: Solution In the same way that we use `Gitlab::HTTP` to make safe web requests, have a class that can make safe(r) file requests. - Create class(es) that have the same interface as File and IO. Then add rubocop rules to prevent further introduction of direct calls to IO or File - Add a `safe_read` method that accepts a path to read and the valid folder location, and we check it against `Gitlab::Utils#check_path_traversal!` and `#check_allowed_absolute_path!` - Then we go through the code slowly transitioning from `read(path)` to `safe_read(path, allowed_locations)` We could later do the same for `write` methods and `readlines` and all those sort of extra methods. ```ruby class Gitlab class IO def read(*args) super(args) end def safely_read(path, allowlist, *args) Gitlab::Utils.check_path_traversal!(path) Gitlab::Utils.check_allowed_absolute_path!(path, allowlist) raise if File.symlink?(path) super(path, args) end end end ``` ```diff module Gitlab module ImportExport module Json class NdjsonReader MAX_JSON_DOCUMENT_SIZE = 50.megabytes attr_reader :dir_path def initialize(dir_path) @dir_path = dir_path @consumed_relations = Set.new end def exist? Dir.exist?(@dir_path) end def consume_attributes(importable_path) # This reads from `tree/project.json` path = file_path("#{importable_path}.json") raise Gitlab::ImportExport::Error, 'Invalid file' if !File.exist?(path) || File.symlink?(path) --- data = File.read(path, MAX_JSON_DOCUMENT_SIZE) +++ data = Gitlab::IO.read(path, dir_path, MAX_JSON_DOCUMENT_SIZE) json_decode(data) end ``` ### :crying_cat_face: Downsides #### :dart: Coverage is GitLab code only This protection wouldn't apply to dependencies, which is where a lot of file handling happens GitLab-rails: ``` $ git grep -E '(File.read|IO.read)' | grep -v '_spec.rb' | grep -v 'spec/' | wc -l 181 ``` Gems used by GitLab-rails: ``` $ grep -R -E '(File.read|IO.read)' * | grep -v '_test.rb' | grep -v '_spec.rb' | wc -l 1500 ``` #### :confused: The caller might not know the allowed dirs The allowlist might be dynamic. Who validates the dir? (E.g. in the diff example above). We could have a deny list of paths though, to accompany (and override??...) the allowlist - `/etc/*` - `"#{Rails.root}/config/*.yml"` ### :crystal_ball: Alternatives - https://gitlab.com/gitlab-org/gitlab/-/issues/349719+ would mean that, even if there is arbitrary file read, it can't read sensitive files. But it may prove difficult to define "sensitive" files, as the GitLab app still needs to read private repo data for example. Not as "critical" as `secrets.yml`, but still bad! So it'd be less of an alternative and more of a "do this too" - https://gitlab.com/gitlab-org/gitlab/-/issues/413766+
issue