warn instead of crashing on invalid dates

We used to completely crash when a feed had invalid or missing
dates. After reviewing the standards, it turns out this is not quite
valid behavior: RSS 0.90 and 0.91, for example, do not have dates at
all. Yet it seems to me a valid feed should minimally include *some*
timestamps and the more likely explanation for a missing "parsed"
field is that feedparser wasn't able to parse the feed properly

Therefore, turn this into a warning. This will be annoying as hell for
some users and feeds, unfortunately, but I don't think silently
ignoring those errors will be much better, as we *do* need a
timestamp (for example to generate valid emails) internally.

We fallback to the current time, for lack of a better alternative.

Closes: #7
parent 3dd1e2c0
......@@ -321,6 +321,10 @@ class Feed(feedparser.FeedParserDict):
warnings.simplefilter("ignore")
item['updated_parsed'] = item.get('updated_parsed', item.get('published_parsed', item.get('created_parsed', self.get('updated_parsed', self.get('published_parsed', False))))) # noqa
assert item.get('updated_parsed') is not None
if not item.get('updated_parsed'):
logging.warning('no parseable date found in feed item %s from feed %s, using current time instead',
item.get('id'), self.get('url'))
item['updated_parsed'] = datetime.utcnow().timestamp()
# 2. add UID if missing (issue #112)
if not item.get('id'):
......
......@@ -5,12 +5,44 @@ Content-Transfer-Encoding: 7bit
Date: Sun, 03 Sep 2017 09:03:54 -0000
To: to@example.com
From: weird-dates <to@example.com>
Subject: test item
Message-ID: http-example-com-test
Subject: missing date
Message-ID: http-example-com-test-missing-date
User-Agent: feed2exec (0.5.dev8+ng8893be0.d20170920)
Precedence: list
Auto-Submitted: auto-generated
Archived-At: http://example.com/test/
test descr1
This item has no date but there's one on the feed to fallback on
From weird-dates Sun Sep 3 09:03:54 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Date: Sun, 03 Sep 2017 09:03:54 -0000
To: to@example.com
From: weird-dates <to@example.com>
Subject: missing space
Message-ID: http-example-com-test-missing-space
Precedence: list
Auto-Submitted: auto-generated
Archived-At: http://example.com/test/
This item has a date that feedparser has trouble with, probably because of the missing space between the day of week and date
From weird-dates Sun Sep 3 09:03:54 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Date: Sun, 03 Sep 2017 09:03:54 -0000
To: to@example.com
From: weird-dates <to@example.com>
Subject: no timezone
Message-ID: http-example-com-test-no-timezone
Precedence: list
Auto-Submitted: auto-generated
Archived-At: http://example.com/test/
This item has a date that feedparser has trouble with, maybe because of the missing timezone
......@@ -3,14 +3,30 @@
<channel>
<title>Test with weird dates.</title>
<link>http://example.com/</link>
<description>Test feed with only a date on the feed but not on items.</description>
<description>Test feed with weird date problems.</description>
<atom:link href="http://example.com/rss" rel="self"></atom:link>
<language>en-us</language>
<lastBuildDate>Sun, 03 Sep 2017 09:03:54 -0000</lastBuildDate>
<item>
<title>test item</title>
<title>missing date</title>
<link>http://example.com/test/</link>
<description type="text/plain">test descr1</description>
<guid>http://example.com/test/</guid>
<description type="text/plain">This item has no date but there's one on the feed to fallback on</description>
<guid>http://example.com/test-missing-date/</guid>
</item>
<item>
<title>missing space</title>
<link>http://example.com/test/</link>
<description type="text/plain">This item has a date that feedparser has trouble with, probably because of the missing space between the day of week and date</description>
<guid>http://example.com/test-missing-space/</guid>
<pubDate>Tue,19 Feb 2019 14:08:19 GMT</pubDate>
</item>
<item>
<title>no timezone</title>
<link>http://example.com/test/</link>
<description type="text/plain">This item has a date that feedparser has trouble with, maybe because of the missing timezone</description>
<guid>http://example.com/test-no-timezone/</guid>
<pubDate>Sun, 15 Feb 2015 00:00:00</pubDate>
</item>
</channel></rss>
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment