Handle HTML entities when parsing URLs

I recently saw HTML entities within URLs, e.g. like in www.example.com/ü.txt (www.example.com/übel.txt).

Firefox translates ü into UTF-8 and thus uses %C3%BC within a GET request.

Wget1.x does no conversion and thus tries to GET ü... which was accepted by the server in my case as well. I did not test if a server like apache does translate this into the correct filename. Any opinions how we should treat this ?

There are also &#nnnn; and &#xhhhh; variants which needs Unicode to UTF-8 conversion (which is straight forward - I already have a working function).

For a list of named entities see http://www.w3.org/TR/html4/sgml/entities.html