Fix -p -nc behavior if document already exists
wget2 -p -nc example.com
downloads and saves index.html even if it already exists.
The intended behavior would be to check if the document exists. If not: download and ... (this already works) If it exists: do a HEAD request to see if it is a CSS / HTML file. If it is, load file from disk and parse ... If it is not, do nothing.
Instead of the HEAD request, we could also open the document and do a heuristic check. See https://www.w3.org/TR/2011/WD-html5-20110113/parsing.html#determining-the-character-encoding
Also, we could adapt the xattr
feature from Wget1.x (we should do that anyways) and save the original content-type as xattr, that would be pretty elegant. Only if that info is not available, we should go ahead with GHEAD and/or heuristics.