make sure we respect feed efficiency principles
Apart from paged feeds (RFC5005, see #33), we should generally check feed2exec for efficiency. We don't want to be hammering those poor sites too badly.
Normally, we're doing pretty good already, as we have an HTTP-level cache that checks headers and doesn't pull the feed if unchanged. But there are many more tricks documented in https://www.earth.org.uk/RSS-efficiency.html that we should look into. The TL;DR:
1. use Cache-Control max-ageHTTP headers for a "do not poll again before" time: savings of 10x or much more are likely if the feed server is set up well: an unnecessary feed poll avoided entirely is the cheapest kind!2. use a local cache and conditional GET(eg sendIf-Modified-Sinceand/orETagHTTP headers): savings of 10x or more are likely3. allow compression of the feed that you pull down (set Accept-EncodingHTTP headers) with at least gzip: savings of 2x to 10x are likely4. avoid fetching the feed on skipHours(and/orskipDays) in an RSS feed: savings of 2x are plausible, and can be especially renewables/climate friendly5. use error responses 429 ("Too Many Requests") and 503 ("Service Unavailable") Retry-Afterheader for a "do not poll again before" time (likeCache-Controlmax-ageabove) when present: do NOT retry immediately/faster/repeatedly!
We might be doing some of this already (2 and 3 for example) but I suspect our scheduling is not respecting the others at all, as we don't keep much state of the remote feeds.
Edited by Antoine Beaupré