SECURITY: Unicode canonicalization vulnerability

MSVR 50480
Vendor: GNU

There is a Unicode canonicalization vulnerability in libidn and libidn2
which can be used to bypass security filters that attempt to ensure that
an untrusted URL belongs to a particular domain.

When an IDN URL is parsed, it is converted to its KC normalization form.
There are a number of Unicode characters which, when converted to their
KC normalization form, become characters with syntax significant in a
URL. As an example, the character “℀” (U+2100) becomes the ASCII “a/c”,
which includes the URL path separator character.

With the exception of using libidn2 with the “–no-tr46” flag (or
equivalent), both libidn and libidn2 accept URL’s that contain
characters which will normalize in this fashion, meaning that a URL like
“https://evil.c℀.microsoft.com/some.txt” will become
“https://evil.ca/c.microsoft.com/some.txt”. In scenarios where a service
attempts to validate that a user-provided URL belongs to a particular
domain or subdomain, often a regular expression is used to ensure that
the hostname portion of the URL ends with “.trusteddomain.com”. This is
typically done before converting to ASCII, which is often done only to
insert the URL in an HTTP response or to make a web request. Because the
normalization performed by libidn effectively changes the hostname of
the URL, a security check like this can be bypassed by including Unicode
characters whose normalization form includes syntax-significant
characters for URL’s.

This is particularly significant for OAUTH authenticators. Various OAUTH
flows involve an application providing an application ID and a redirect
URL to an authenticator service. The authenticator service then verifies
that the redirect URL belongs to the allow-listed hostname (or a
subdomain of this hostname) for the application ID before making a
request (or issuing a redirect) to the provided URL with an
authentication token. An attacker can leverage the behavior in
idnlib/idnlib2 to bypass the subdomain allow-list check by constructing
a URL like “https://evil..c℀.microsoft.com/some.txt”. This URL will be
parsed as being a subdomain of Microsoft.com, which in this scenario is
the allow-listed domain for the application. However, when a request or
redirect is issued to this domain, the request will actually be made to
https://evil.ca/c.microsoft.com/some.txt  , causing the authentication
token to go the attacker’s evila.ca service instead.

The fix I recommend for this vulnerability is to return an error when a
call to idnlib or idnlib2 is made with a URL whose KC normalization form
includes a syntax-significant character (“/”, “?”, “#”, “@”, or “:”).
that wasn’t present in the original URL.

Assorted repros follow:
echo $'microsoft.c\u2100.visualstudio.microsoft.com' | idn2
microsoft.ca/c.visualstudio.microsoft.com

$ echo $'microsoft.c\u2100.visualstudio.microsoft.com' | idn | xargs curl -v
*   Trying 13.77.161.179...
* TCP_NODELAY set
* Connected to microsoft.ca (13.77.161.179) port 80 (#0)
> GET /c.visualstudio.microsoft.com HTTP/1.1
> Host: microsoft.ca
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Date: Fri, 22 Feb 2019 00:23:29 GMT
< Server: Kestrel
< Content-Length: 0
< Location: https://www.microsoft.ca/c.visualstudio.microsoft.com