PoC for our own Rich Embed Scraper
Goal
We should look into building our own in-house rich-embed scraper to generate our own previews for media.
- Should be capable of taking URLs with and without protocols - honoring the canonical tag if present.
- Security should be thoroughly discussed in advance. - should possibly be a dedicated service ring-fenced off.
- Should be locked such that it can be used only from on-site.
- Should have functional parity with existing rich embed scraping solution.
- Should be able to correctly handle SSR enabled sites (www.minds.com).
- Should utilize caching.
- Should NOT allow access to paywalled, private, closed or otherwise restricted Minds content.
- Should discuss whether we allow generation of rich embeds for links that aren't available logged out.
Deliverable
Prototype exploring the above, trying to see what pitfalls exist.
Resources to evaluate:
What needs to be done
QA
UX/Design
Personas
Experiments
Acceptance Criteria
-
Implement https://metascraper.js.org/ as microservice -
Spec tests -
Feature flag for phased control
Definition of Ready Checklist
-
Definition Of Done (DoD) -
Acceptance criteria -
Weighted -
QA -
UX/Design -
Personas -
Experiments
Edited by Ben