feat(crawler): load Sitemap URLs
Feature request
1. Summary of expected behavior
⌦ Describe what you want to accomplish, in what role/capacity, and why it's important to you.
As a consumer,
I want to collect sitemap URLs using only the hd-product-extractor,
in order to simplify usage and save time.
2. Acceptance criteria
We'll be done when hd-product-extractor:
-
Exposes the Sitemap.load
method.
3. Notes
Fig. 1: Interface suggestions
Typescript
import HomeDepotProductExtractor, {
homeDepotCrawlerOptions
} from './crawlers/hd'
homeDepotCrawlerOptions.maxRequestsPerCrawl = 20
const urls = [
'https://www.homedepot.com/sitemap/P/PIPs/21/021-001-0.xml'
]
const extractor =
new HomeDepotProductExtractor(homeDepotCrawlerOptions)
await extractor.run(urls)
Edited by Zaira Ardor