Introduce a class for safe file handling as an alternative to IO and File
<!--IssueSummary start-->
<details>
<summary>
Everyone can contribute. [Help move this issue forward](https://handbook.gitlab.com/handbook/marketing/developer-relations/contributor-success/community-contributors-workflows/#contributor-links) while earning points, leveling up and collecting rewards.
</summary>
- [Close this issue](https://contributors.gitlab.com/manage-issue?action=close&projectId=278964&issueIid=413916)
</details>
<!--IssueSummary end-->
## :fire: Problem
In the past, we faced some ~"Application Security" issues with path traversals that allowed :smiling_imp: users to get the contents of an arbitrary file in the system (`gitlab.yml` for example).
See
* https://gitlab.com/gitlab-org/gitlab/-/issues/412371
* https://gitlab.com/gitlab-org/gitlab/-/issues/409622
In one of the retrospectives, we [discussed](https://gitlab.com/gitlab-com/gl-security/rcas/-/issues/9#note_1405310422) how we could improve things so that we can protect GitLab against those kind of vulnerabilities.
## :fire_engine: Solution
In the same way that we use `Gitlab::HTTP` to make safe web requests, have a class that can make safe(r) file requests.
- Create class(es) that have the same interface as File and IO. Then add rubocop rules to prevent further introduction of direct calls to IO or File
- Add a `safe_read` method that accepts a path to read and the valid folder location, and we check it against `Gitlab::Utils#check_path_traversal!` and `#check_allowed_absolute_path!`
- Then we go through the code slowly transitioning from `read(path)` to `safe_read(path, allowed_locations)`
We could later do the same for `write` methods and `readlines` and all those sort of extra methods.
```ruby
class Gitlab
class IO
def read(*args)
super(args)
end
def safely_read(path, allowlist, *args)
Gitlab::Utils.check_path_traversal!(path)
Gitlab::Utils.check_allowed_absolute_path!(path, allowlist)
raise if File.symlink?(path)
super(path, args)
end
end
end
```
```diff
module Gitlab
module ImportExport
module Json
class NdjsonReader
MAX_JSON_DOCUMENT_SIZE = 50.megabytes
attr_reader :dir_path
def initialize(dir_path)
@dir_path = dir_path
@consumed_relations = Set.new
end
def exist?
Dir.exist?(@dir_path)
end
def consume_attributes(importable_path)
# This reads from `tree/project.json`
path = file_path("#{importable_path}.json")
raise Gitlab::ImportExport::Error, 'Invalid file' if !File.exist?(path) || File.symlink?(path)
--- data = File.read(path, MAX_JSON_DOCUMENT_SIZE)
+++ data = Gitlab::IO.read(path, dir_path, MAX_JSON_DOCUMENT_SIZE)
json_decode(data)
end
```
### :crying_cat_face: Downsides
#### :dart: Coverage is GitLab code only
This protection wouldn't apply to dependencies, which is where a lot of file handling happens
GitLab-rails:
```
$ git grep -E '(File.read|IO.read)' | grep -v '_spec.rb' | grep -v 'spec/' | wc -l
181
```
Gems used by GitLab-rails:
```
$ grep -R -E '(File.read|IO.read)' * | grep -v '_test.rb' | grep -v '_spec.rb' | wc -l
1500
```
#### :confused: The caller might not know the allowed dirs
The allowlist might be dynamic. Who validates the dir? (E.g. in the diff example above).
We could have a deny list of paths though, to accompany (and override??...) the allowlist
- `/etc/*`
- `"#{Rails.root}/config/*.yml"`
### :crystal_ball: Alternatives
- https://gitlab.com/gitlab-org/gitlab/-/issues/349719+ would mean that, even if there is arbitrary file read, it can't read sensitive files. But it may prove difficult to define "sensitive" files, as the GitLab app still needs to read private repo data for example. Not as "critical" as `secrets.yml`, but still bad! So it'd be less of an alternative and more of a "do this too"
- https://gitlab.com/gitlab-org/gitlab/-/issues/413766+
issue