PoC for our own Rich Embed Scraper

Goal

We should look into building our own in-house rich-embed scraper to generate our own previews for media.

Should be capable of taking URLs with and without protocols - honoring the canonical tag if present.
Security should be thoroughly discussed in advance. - should possibly be a dedicated service ring-fenced off.
Should be locked such that it can be used only from on-site.
Should have functional parity with existing rich embed scraping solution.
Should be able to correctly handle SSR enabled sites (www.minds.com).
Should utilize caching.
Should NOT allow access to paywalled, private, closed or otherwise restricted Minds content.
Should discuss whether we allow generation of rich embeds for links that aren't available logged out.

Prototype exploring the above, trying to see what pitfalls exist.

Edited Jul 04, 2022 by Ben

Assignee

Select assignees

Time tracking