Fix -p -nc behavior if document already exists

wget2 -p -nc example.com downloads and saves index.html even if it already exists.

The intended behavior would be to check if the document exists. If not: download and ... (this already works) If it exists: do a HEAD request to see if it is a CSS / HTML file. If it is, load file from disk and parse ... If it is not, do nothing.

Instead of the HEAD request, we could also open the document and do a heuristic check. See https://www.w3.org/TR/2011/WD-html5-20110113/parsing.html#determining-the-character-encoding

Also, we could adapt the xattr feature from Wget1.x (we should do that anyways) and save the original content-type as xattr, that would be pretty elegant. Only if that info is not available, we should go ahead with GHEAD and/or heuristics.

Assignee Loading
Time tracking Loading