Evaluate (continuous) URL linting and external checks for handbook and website
Problem to solve
The handbook and website contain broken URLs, and some are reported manually by team members and the wider community. Currently, there is not much visibility on broken URLs in the handbook and website, except in Google Analytics tools that are not directly visible in Merge Requests or CI/CD feedback.
Context
The handbook has linters that check relative URLs and anchors for checking the "inner" URLs, a cleanup is tracked in #8514 This issue is to research on the problem and providing helpful ideas for www-gitlab-com. If it does not work out, it does not pollute the handbook linter issue.
I came across the problem with blog posts that should have been determined during CI/CD URL linting, and have been thinking about the problem since 2020. One further problem is the external view: Parse a handbook page and try to open all external URLs. These can be other websites, google docs, (confidential) issues, etc. The idea is to open the website as a HTTP client and determine broken links.
All of these checks should be automated runs, providing an overview to fix (or being able to generate fix suggestions/MRs, if that is a usecase for MLOps later).
Note: I had this idea 2.5 years ago, found no DRIs for www-gitlab-com, and gave up. Trying it again.
Proposal
Review the existing handbook linting features for links and anchors in #8514 From a first peek, it seems that the rake tasks are to be run manually, and are not enabled in CI/CD.
An additional task for URL checking is how external URLs work. That applies to the handbook, and the website.
-
Evaluate possible tools and methods for external URL checks (in Markdown, etc.) -
Benchmark automation as CI/CD schedules, or inside pipelines with checking changed files (analyse delayed pipeline times) -
Identify deployments that are critical to have correct URLs (all of them, but - for example, blog posts)
-
-
Repurpose learnings into a content piece - markdown lint and URL checks in CI/CD -
Document the manual commands in the handbook - how to use the handbook, so that everyone can contribute
Additional ideas
- Consider adding website monitoring, that verifies the website reports and broken URLs too. Could potentially define SLOs for critical websites and targets.
- Create a report of external URLs, and find duplicates that can be replaced with URL shorteners for example.