Wget2 performs incorrect IRI-encoding of GET URLs
E.g. URL: "http://git.savannah.gnu.org/gitweb/?p=autoconf-archive.git;a=blob_plain;f=m4/ax_cxx_compile_stdcxx.m4"
Using Wget2
, this creates a GET request with the following URI:
/gitweb/?p=autoconf-archive.git%3Ba=blob_plain%3Bf=m4%2Fax_cxx_compile_stdcxx.m4
This URI is incorrect according to both, the old HTTP/1.1 spec (RFC 2616) and the new one (RFC 7230). In both cases, the spec clearly states that when comparing two URIs, one must not convert reserved characters to their percent encoded variants.
Here, the ;
is a reserved character, and hence the server is entirely correct in returning a HTTP 404 Not Found Response.
Also, wget 1.x does the correct thing and does not percent encode the ; characters