Parsing XML files
When parsing XML files, Scrapy can't extract some data:
2018-07-18 17:04:20 [scrapy.core.scraper] ERROR: Spider error processing <GET https://example.com/file.xml> (referer: https://example.com/)
Traceback (most recent call last):
File "/home/julien/.pyenv/versions/crowltech/lib/python3.6/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
yield next(it)
File "/home/julien/.pyenv/versions/crowltech/lib/python3.6/site-packages/scrapy/spidermiddlewares/offsite.py", line 30, in process_spider_output
for x in result:
File "/home/julien/.pyenv/versions/crowltech/lib/python3.6/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/home/julien/.pyenv/versions/crowltech/lib/python3.6/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "/home/julien/.pyenv/versions/crowltech/lib/python3.6/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "/home/julien/.pyenv/versions/crowltech/lib/python3.6/site-packages/scrapy/spiders/crawl.py", line 78, in _parse_response
for requests_or_item in iterate_spider_output(cb_res):
File "/home/julien/crowl/crowl/spiders.py", line 51, in parse_url
yield self.parse_item(response)
File "/home/julien/crowl/crowl/spiders.py", line 86, in parse_item
body_content = response.xpath('//body').extract()[0]
IndexError: list index out of range
https://example.com/file.xml isn't saved in the database.
We need to add exception management to save processed data.