Dealing with 301 Redirects and Recursion
So, I came across an interesting scenario earlier today. I was trying to recursively download a website with --no-parent -r
.
The website is hosted using nginx
and uses the auto_index
module. Which means a request for /foo/
will automatically generate an index file and serve it.
The exact command I used was wget2 --no-parent -r example.com/foo/bar
where bar
is a directory. So, as expected, the server responds with a 301 Permanently Moved
to /foo/bar/
and then proceeds to serve an index file.
However, since wget2
doesn't accept the new server name for the download, the iri still contains /foo/bar
as the location. This means all files in /foo/
are also considered as part of the current directory even though they really are a part of the parent.
Now, this can be easily dealt with if we simply consider the IRI to the new one. But it may cause interesting side-effects. So I want to discuss this here before making any changes