Experiment: What's the maximum number of rules that could come from one single filter?
Background
We have filters. They exist inside of the subscriptions we support. Here's easylist for example. Chrome uses DNR Rules now.
Currently, we use a fancy script to convert those filters into rules.
So for example this filter: &adstrade=
gets converted into this rule:
{
"priority": 1000,
"condition": {
"urlFilter": "&adstrade=",
"isUrlFilterCaseSensitive": false
},
"action": {
"type": "block"
},
"id": 40142
}
In this case it's a 1 to 1 relationship but in other cases it could be 1 filter to n
rules.
Use case
What we need to know in order to inform the architecture of webext#389 and core#476 (the reporting stuff) is what the upper limit of n
for all the MV3 subscriptions we currently have is.
What to change
No changes are expected from this experiment.
Findings
There are for types of filters and rules:
- CSP
- Allowing
- Redirect
- Blocking
Blocking and Redirect are the most simple, each filter will generate one rule - the relationship is 1 to 1. Allowing filters can generate 1 or 2 rules, depending on the filter properties. (why?) CSP is the most critical type because it can generate multiple rules when it uses the $domain option (one for each domain)
Two comments copied directly from the code explain the delicate situation around CSP filters:
// The DNR makes no distinction between CSP rules and main_frame/sub_frame
// rules. Ideally, we would give CSP rules a different priority therefore,
// to ensure that a $csp exception filter would not accidentally allowlist
// the whole website. Unfortunately, I don't think that's possible if we are
// to also support the distinction between specific and generic rules.
// Luckily, we are adding an "allow" rule (not "allowAllRequest") here and
// there is no such thing as a blocking filter which applies to the
// $document (main_frame), so we don't have to worry about that. There is
// such a thing as a $subdocument blocking filter though, which a $csp
// exception filter should not usually affect.
// As a compromise in order to support both $csp and $genericblock, we
// accept that $csp exception filters might wrongly prevent frame-blocking
// filters from matching. If this compromise proves problematic, we might
// need to reconsider this in the future.
// Chromium doesn't consider main_frame requests to have initiated from their
// URL, so the domains/excludedDomains rule conditions won't work as expected
// for main_frame requests. This is a problem for $csp filters which also use
// the $domain option. As a partial workaround, we generate a separate
// urlFilter rule for each domain. But note, we can't support excludedDomains
// ($csp=...$~domain=...) or urlFilter conditions (||...$csp=...,domain=...).
// See https://bugs.chromium.org/p/chromium/issues/detail?id=1207326
//
// Note: Hopefully, this workaround won't be necessary for long, but if we
// need it long-term, then perhaps we should generate one rule with a longer
// regexFilter condition instead. But if we do that, we will need to be
// careful not to hit the memory limit for regular expression rule conditions
// and also to match subdomains correctly.
I analyzed the following subscriptions:
- EasyList
- EasyList Germany+EasyList
- Allow nonintrusive advertising
- ABP filters (compliance)
In an excel sheet (should I provide the link to the sheet here? Idk how sensible this data is), I compiled the number of rules generated by each filter, discriminating by the 4 types of rules. These important conclusions I was able to draw from the data:
- The maximum number of rules generated by a single filter is 349. It's a CSP filter that uses 349 domains.
- The big majority of filters generate a single rule. The "Allow nonintrusive advertising" is the only subscription that has allowing filters generating two rules.
- Approximately 58% of the filters are URL Filters and 42% are content filters.
- There are filters that generate zero rules. (why? Is this intended?)