Do not log an attempt for newlines in the traversal path middleware
Context
In Rack middleware for path traversal checks (!123477 - merged), we introduced a middleware that checks for path traversals.
This was introduced behind a feature flag. Given that this middleware could break existing user workflows, this middleware will only log attempts (for now). In other words, a request will never be rejected because of this middleware.
During the feature flag (slow) rollout, we noticed that the newline character ('\n') was detected as a path traversal attempt. This will categorize genuine requests as attempts. For example, GraphQL requests. See this comment.
In this middleware, we're updating the detection part to ignore \n
chars.
See #455861+
🔬 What does this MR do and why?
- In the path traversal util class, split the existing regex in two:
- The base without
\n
. - The base +
\n
.
- The base without
- While at it, remove the
skip_decoding:
option. This option was introduced for the middleware (as the middleware already decodes the path). Instead of using#check_path_traversal!
, we can use directly the regex (see previous point).- This has the upside of not logging path traversal attempts twice.
Note that the middleware is heavily behind feature flag. Hence, no changelog.
✅ MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
🌈 Screenshots or screen recordings
No changes in the UI
🔍 How to set up and validate locally
We're going to use the following GraphQL query to test the change:
{
currentUser {
username
}
}
The difference is that we're going to send the query
as an (http) query parameter. This (http) query parameter will be encoded. Its value will be %7B%0A++currentUser+%7B%0A++++username%0A++%7D%0A%7D
. Notice the %0A
which is \n
encoded.
Enable the middleware feature flag :
Feature.enable(:check_path_traversal_middleware)
1️⃣ On master
- Log in with your browser.
- Navigate to
/api/graphql?query=%7B%0A++currentUser+%7B%0A++++username%0A++%7D%0A%7D
. - Check
tail -f log/application_json.log
:
{"severity":"WARN","time":"2024-04-15T10:14:06.184Z","correlation_id":"01HVGKZTFHY6NVW4ZE1WT06DEP","message":"Potential path traversal attempt detected","path":"/api/graphql?query={\n currentUser {\n username\n }\n}"}
{"severity":"WARN","time":"2024-04-15T10:14:08.502Z","correlation_id":"01HVGKZTFHY6NVW4ZE1WT06DEP","meta.caller_id":"GraphqlController#execute","meta.remote_ip":"172.16.123.1","meta.feature_category":"not_owned","meta.user":"root","meta.user_id":1,"meta.client_id":"user/1","method":"GET","fullpath":"/api/graphql?query=%7B%0A++currentUser+%7B%0A++++username%0A++%7D%0A%7D","message":"Potential path traversal attempt detected","status":304,"class_name":"Gitlab::Middleware::PathTraversalCheck"}
Notice that we have 2
logs:
- The one coming from the
#check_path_traversal!
function. - One coming from the middleware itself.
2️⃣ With this MR
- Log in with your browser.
- Navigate to
/api/graphql?query=%7B%0A++currentUser+%7B%0A++++username%0A++%7D%0A%7D
. - Check
tail -f log/application_json.log
:
(empty)
No new rows in the log because the
3️⃣ With this MR + path traversal attempt
Let's see the changes in the logs with a genuine path traversal attempt.
- Log in with your browser.
- Navigate to
/foo/../bar
. - Check
tail -f log/application_json.log
:
{"severity":"WARN","time":"2024-04-15T10:18:16.311Z","correlation_id":"01HVGM7E4VKX3VGNTEH47R8D88","meta.caller_id":"ApplicationController#route_not_found","meta.remote_ip":"172.16.123.1","meta.user":"root","meta.user_id":1,"meta.client_id":"user/1","method":"GET","fullpath":"/foo%2F..%2Fbar","message":"Potential path traversal attempt detected","status":404,"class_name":"Gitlab::Middleware::PathTraversalCheck"}
Only one log: the one coming from the middleware.