I posit that this check is no longer important due to the following reasons:
UTF-7 is only rendered by IE 11 in very limited conditions, which has nothing to do with 'mistmatched mimetypes'
UTF-16 encoded attack strings will still be interpreted regardless of whether the charset is set or not (provided the attack string is injected at the beginning of the document)
Browsers no longer (or maybe never did) 'switch' encoding if the charsets mismatch
This check identifies responses where the HTTP Content-Type header declares a charset different from the charset defined by the body of the HTML or XML. When there’s a charset mismatch between the HTTP header and content body Web browsers can be forced into an undesirable content-sniffing mode to determine the content’s correct character set.
An attacker could manipulate content on the page to be interpreted in an encoding of their choice. For example, if an attacker can control content at the beginning of the page, they could inject script using UTF-7 encoded text and manipulate some browsers into interpreting that text.
This check was written many years ago, before various browser security enhancements. The below line is actually wrong:
When there’s a charset mismatch between the HTTP header and content body Web browsers can be forced into an undesirable content-sniffing mode to determine the content’s correct character set.
It has nothing to do with mismatch, but more of where the encoded injected data ends up. Every browser tested will not 'switch' encoding, after it detects the encoding type in the first few bytes. You need to be able to inject into the very first byte location of a response for an attack string to be interpreted as the encoded attack string.
This is true regardless if a Content-Type: text/html; charset=... value is set (Except IE 11)! Even by properly setting a charset type in the Content-Type header, if given UTF-16 text at the beginning of the document, the browser will still interpret it as UTF-16. The only exception I could find was IE 11, which validates that the document is the proper encoding type when the charset value is set in the Content-Type header.
For example, if an attacker can control content at the beginning of the page, they could inject script using UTF-7 encoded text and manipulate some browsers into interpreting that text.
UTF-7 was only really exploitable against IE9 and IE11. While UTF-7 is still supported by IE 11 today, it will only be rendered if the Content-Type: text/html; charset=UTF-7 header is set, or the <meta charset=UTF-7> element is set.
However, UTF-16BE/LE encodings can still potientially be used to bypass XSS filters, but again, the charset has little bearing on this, compared to where the injected encoded string ends up being returned.
@idawson This makes a lot of sense to me. I'm fine with not implementing it. Now I just have to decide what to do with ones that we aren't implementing, even as a second pass to get less important checks. I'd like for this epic to just contain checks that we are implementing or, if they are closed, we have implemented. I think that I'll move it to a different epic for deprecated passive checks and close it there.
@derekferguson I updated the epic to a new one I created, and closed the issue so that we can better track the remaining work. Please feel free to update labels/epics if you want to categorize differently.