Add option to avoid indexing by external search engine crawlers through config
As Terri,
I want the content of the OECD (pre-prod/prod) instance(s) of the Data Explorer & Viewer, while being available publicly for a temporary beta period contingent, not to be indexed by external search engine crawlers,
So that these OECD sites become discoverable by search engines only once the new front-end applications are proven to be ready to fully replace stats.oecd.org at an equivalent or better level of service and can be merged with data.oecd.org into a single sub-domain of OECD or as a sub-folder of oecd.org in a secure way in an OECD-managed cloud environment. Thus domain names might change when the beta phase is over, and an already indexed site would become less dicoverable since search optimisers lower the visibility of redirected URLs.
Specifications
- Add appropriate metadata to the DE and Viewer pages/sites that make external search engine crawlers skip them:
- HTML page header:
<meta name="robots" content="noindex, nofollow">
in all pages - HTTP response header
X-Robots-Tag: noindex, nofollow
in all requests - robots.txt file at the site root containing:
User-agent: * Disallow: /
- HTML page header:
- Make this an optional feature through configuration (default: metadata not to be used)