Honor HTTP 301 by caching the permanent redirects
First Run:
ubuntu@rootkea-wget2-5055322:~/workspace/testing$ wget2 https://facebook.com
[0] Downloading 'https://facebook.com' ...
HTTP response 301 [https://facebook.com]
[0] Downloading 'https://www.facebook.com/' ...
HTTP response 302 [https://www.facebook.com/]
[0] Downloading 'https://www.facebook.com/unsupportedbrowser' ...
Saving 'index.html.8'
HTTP response 200 [https://www.facebook.com/unsupportedbrowser]
Second Run:
ubuntu@rootkea-wget2-5055322:~/workspace/testing$ wget2 -d https://facebook.com
24.104800.288 Local URI encoding = 'UTF-8'
24.104800.288 Input URI encoding = 'UTF-8'
24.104800.288 add HSTS www.facebook.com:443 (maxage=15552000, includeSubDomains=0)
24.104800.288 add HSTS facebook.com:443 (maxage=15552000, includeSubDomains=0)
24.104800.288 Fetched HSTS data from '/home/ubuntu/.wget-hsts'
24.104800.289 add HPKP projects.dm.id.lv (maxage=31536000, includeSubDomains=1)
24.104800.289 Fetched HPKP data from '/home/ubuntu/.wget-hpkp'
24.104800.289 add TLS session data for www.facebook.com (maxage=64800, size=3729)
24.104800.289 add TLS session data for facebook.com (maxage=64800, size=3725)
24.104800.289 add TLS session data for localhost (maxage=64800, size=1303)
24.104800.289 Fetched TLS session data from '/home/ubuntu/.wget-session'
24.104800.289 add OCSP host www.facebook.com (maxage=1500896864)
24.104800.289 add OCSP host facebook.com (maxage=1500896864)
24.104800.289 Fetched OCSP hosts from '/home/ubuntu/.wget-ocsp_hosts'
24.104800.289 add OCSP cert 152151b387412a95ab90fd4664f2d98880408e9e43913124e4c91226a483386b (maxage=1500895983,valid=1)
24.104800.289 add OCSP cert 19400be5b7a31fb733917700789d2f0a2471c0c9d506c0e504c06c16d7cb17c0 (maxage=1500895983,valid=1)
24.104800.289 Fetched OCSP fingerprints from '/home/ubuntu/.wget-ocsp'
24.104800.289 *url = https://facebook.com
24.104800.289 *3 https://facebook.com
24.104800.289 local filename = 'index.html'
24.104800.289 host_add_job: job fname index.html
24.104800.289 host_add_job: 0x15483b0 https://facebook.com
24.104800.289 host_add_job: qsize 1 host-qsize=1
24.104800.289 queue_size: qsize=1
24.104800.289 queue_size: qsize=1
24.104800.289 queue_size: qsize=1
24.104800.289 [0] action=1 pending=0 host=0x0
24.104800.289 qsize=1 blocked=0
24.104800.289 pause=-1500893280289
24.104800.289 dequeue job https://facebook.com
24.104800.289 resolving facebook.com:443...
24.104800.294 has 31.13.95.36:443
24.104800.294 has 2a03:2880:f102:83:face:b00c:0:25de:443
24.104800.294 Add dns cache entry facebook.com:443
24.104800.294 trying 31.13.95.36:443...
24.104800.294 GnuTLS init
24.104800.312 Certificates loaded: 173
24.104800.313 GnuTLS init done
24.104800.313 TLS False Start requested
24.104800.313 ALPN offering h2
24.104800.313 ALPN offering http/1.1
24.104800.313 found cached session data for facebook.com
24.104800.342 host has no pubkey pinnings stored in hpkp db
24.104800.342 Certificate[0] of 'facebook.com' is valid (cached)
24.104800.342 Certificate[1] of 'facebook.com' is valid (cached)
24.104800.342 update OCSP host facebook.com (maxage=1500896880)
24.104800.342 TLS False Start: off
24.104800.342 ALPN: Server accepted protocol 'h2'
----
Certificate info [0]:
Valid since: Fri Dec 9 00:00:00 2016
Expires: Thu Jan 25 12:00:00 2018
Fingerprint: b6d42815aa9efc1100380c2650b4c60c
Serial number: b6d42815aa9efc1100380c2650b4c60c
Public key: EC/ECDSA, High (256 bits)
Version: #3
DN: C=US,ST=California,L=Menlo Park,O=Facebook\, Inc.,CN=*.facebook.com
Issuer's DN: C=US,O=DigiCert Inc,OU=www.digicert.com,CN=DigiCert SHA2 High Assurance Server CA
Issuer's OID: 2.5.4.6
Issuer's UID: 2.5.4.6
Certificate info [1]:
Valid since: Tue Oct 22 12:00:00 2013
Expires: Sun Oct 22 12:00:00 2028
Fingerprint: aaee5cf8b0d8596d2e0cbe67421cf7db
Serial number: aaee5cf8b0d8596d2e0cbe67421cf7db
Public key: RSA, Medium (2048 bits)
Version: #3
DN: C=US,O=DigiCert Inc,OU=www.digicert.com,CN=DigiCert SHA2 High Assurance Server CA
Issuer's DN: C=US,O=DigiCert Inc,OU=www.digicert.com,CN=DigiCert High Assurance EV Root CA
Issuer's OID: 2.5.4.6
Issuer's UID: 2.5.4.6
----
Ephemeral ECDH using curve (null)
Key Exchange: ECDHE-ECDSA
Protocol: TLS1.2
Certificate Type: X.509
Compression: NULL
Cipher: AES-128-GCM
MAC: AEAD
----
24.104800.343 Handshake completed (resumed session)
24.104800.343 established connection facebook.com
[0] Downloading 'https://facebook.com' ...
24.104800.343 cookie_create_request_header for host=facebook.com path=(null)
24.104800.343 HTTP2 stream id 1
24.104800.343 [0] action=1 pending=1 host=0x1548240
24.104800.343 qsize=1 blocked=0
24.104800.343 pause=-1500893280343
24.104800.343 [0] action=2 pending=1 host=0x1548240
24.104800.343 ## pending_requests = 1
24.104800.343 ## loop responses=0
24.104800.343 [FRAME 0] > SETTINGS
24.104800.343 [FRAME 1] > HEADERS
24.104800.343 [FRAME 1] > :method: GET
24.104800.343 [FRAME 1] > :path: /
24.104800.343 [FRAME 1] > :scheme: https
24.104800.343 [FRAME 1] > :authority: facebook.com
24.104800.343 [FRAME 1] > accept-encoding: gzip, deflate, bzip2, xz, lzma, br
24.104800.343 [FRAME 1] > accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
24.104800.343 [FRAME 1] > user-agent: wget2/1.0.0
24.104800.343 ## loop responses=0
24.104800.369 [FRAME 0] < SETTINGS
24.104800.369 [FRAME 0] < WINDOW_UPDATE
24.104800.369 ## loop responses=0
24.104800.369 [FRAME 0] > SETTINGS
24.104800.369 ## loop responses=0
24.104800.369 [FRAME 0] < SETTINGS
24.104800.369 ## loop responses=0
24.104800.370 [FRAME 1] < WINDOW_UPDATE
24.104800.370 :status: 301
24.104800.370 location: https://www.facebook.com/
24.104800.370 strict-transport-security: max-age=15552000; preload
24.104800.370 vary: Accept-Encoding
24.104800.370 cache-control: public, max-age=2592000
24.104800.370 content-type: text/plain
24.104800.370 content-length: 0
24.104800.370 server: proxygen
24.104800.370 date: Mon, 24 Jul 2017 10:48:00 GMT
24.104800.370 [FRAME 1] < HEADERS
24.104800.370 [FRAME 1] < DATA
24.104800.370 closing stream 1
24.104800.370 ## response status 301
HTTP response 301 [https://facebook.com]
...
Now here we already know (from the first run of wget2) that there is permanent redirect 301 from https://facebook.com to https://www.facebook.com then why are we aren't using the permanent redirect in later invocation?
IMO we should cache the 301 redirects (much like ~/.wget-hsts) and use them. Also while caching we need to respect Cache-Control
header.
Edited by Avinash Sonawane