Provide a sitemap.xml file for public projects
<!--IssueSummary start--> <details> <summary> Everyone can contribute. [Help move this issue forward](https://handbook.gitlab.com/handbook/marketing/developer-relations/contributor-success/community-contributors-workflows/#contributor-links) while earning points, leveling up and collecting rewards. </summary> - [Close this issue](https://contributors.gitlab.com/manage-issue?action=close&projectId=278964&issueIid=36835) </details> <!--IssueSummary end--> ## Problem statement GitLab.com does not provide a sitemap.xml resource. Google is currently picking up links to repositories via the https://gitlab.com/explore/projects endpoint. Many of these URLs are resulting in error pages (e.g. https://gitlab.com/explore/projects?page=11961 as of 2019-11-20 results in error 500) nor is the most optimal way to provide discovery of projects ## Job to be done 1. Block the `/explore/` section in the robots.txt file 1. Generate a sitemap.xml file(s) containing a list of all the public repositories according to the specification: https://www.sitemaps.org/protocol.html 1. The sitemaps should be updated once every 24hrs 1. Ensure the file(s) are linked in the robots.txt file ## Notes 1. The maximum number of URLs per sitemap file is 50000. If multiple files are needed please follow this approach: https://support.google.com/webmasters/answer/75712?hl=en 1. https://support.google.com/webmasters/answer/183668?hl=en
issue