Fix cache key sanitation issues, esp. re. "cache key files"
What does this MR do?
Fix cache key sanitation issues, esp. re. "cache key files"
GitLab can send us various different cache keys as part of the JobResponse. E.g. when "cache key files" is used, it can be kinda-sorta a file path. The previous cache key validation now blocks those cache keys.
The overall goal of the cache key validation is to guard against file path traversal issue. So we now accept that cache keys can be file paths, or something that looks alike.
First we do some "adjustments" on the input cache keys:
- ensure we convert some URL-encoded things (we know might be dangerous) into their ASCII equivalent
- we replace all
\with/
Then we run the cache key through path.Clean, but make sure to
explicitly root it, to guard against "escaping" path traversals. This
will resolve any ./.. to a clean path.
Then we ensure the last path segment does not end in a white space; if it does, we cut those off. We have seen that trailing white spaces can be dangerous, and we did not allow them previously.
Lastly, if the last path segment is empty, we cut that segment off. This might lead to an empty cache key, but the caller handles that.
When we sanitize a cache key, or a cache key can't be sanitized, we still log that as a warning to the user.
Notes:
- The sanitized cache key is only used for the object's name in the bucket (and some logging), not for the local file name.
- The actual/proper fix for all of this is to just hash the cache key, which is around the corner ...
Why was this MR needed?
To fix issues with the cache key sanitation, esp. when cache:key:files is used.
What's the best way to test this MR?
- use a cache in the clouds (S3, ...)
- run with "cache key files" and ensure this works (does not block using that cache)