What guidance can we give for tackling difference errors? Maybe just tackle the obvious ones then we can re-run and look more carefully at the tricky ones?
Or ask for contributors to call out any challenging ones?
Latest list of 302s as of 27/07/22
Charts (WIP MR)
public/charts/charts/globals.html: [ ERROR ] external_links - broken reference to https://www.haproxy.com/blog/haproxy/proxy-protocol/: link has moved permanently to 'http://www.haproxy.com/blog/use-the-proxy-protocol-to-preserve-a-clients-ip-address/'
public/operator/adr/0001-record-architecture-decisions.html: [ ERROR ] external_links - broken reference to https://thinkrelevance.com/blog/2011/11/15/documenting-architecture-decisions: link has moved permanently to 'https://cognitect.com/blog/2011/11/15/documenting-architecture-decisions'
Runner (WIP MR)
public/runner/executors/custom_examples/libvirt.html: [ ERROR ] external_links - broken reference to https://docs.gitlab.com/ee/ssh/: link has moved permanently to 'https://docs.gitlab.com/ee/user/ssh.html' public/runner/executors/kubernetes.html: [ ERROR ] external_links - broken reference to https://kubernetes.io/docs/setup/production-environment/windows/intro-windows-in-kubernetes/: link has moved permanently to 'https://kubernetes.io/docs/concepts/windows/intro/' public/runner/install/windows.html: [ ERROR ] external_links - broken reference to https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.diagnostics/get-winevent?view=powershell-7.1: link has moved permanently to 'https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.diagnostics/get-winevent?view=powershell-7.2&viewFallbackFrom=powershell-7.1'
Edited
Designs
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related or that one is blocking others.
Learn more.
@sselhorn i've started putting this together... my thinking is contributors can "claim" a batch at a time?
We just need to provide a few pointers. I wonder if I maybe try and attack batch 1 if I will find enough variety to comment on the different scenarios?
Here's my attempt at thinking out loud while reviewing the 1st batch. Can I/we boil this down to some basic steps/suggestions/guidelines?
- public/charts/charts/globals.html: https://www.haproxy.com/blog/haproxy/proxy-protocol/: link has moved permanently to 'http://www.haproxy.com/blog/use-the-proxy-protocol-to-preserve-a-clients-ip-address/'
I've skipped this one for the moment as it's in a different repo. I think just this one slipped through?
This was a nice perplexing one... Hitting the URL did indeed return a 404, but still loaded the page AOK.
It wasn't immediately apparent what was going on, so I found the gitlab link on the navigation menu and followed it. Very subtle... but the link was missing the trailing forward slash... e.g. https://casdoor.org/docs/integration/gitlab
- public/ee/administration/geo/disaster_recovery/index.html: https://docs.microsoft.com/en-us/azure/postgresql/howto-read-replicas-portal: link has moved permanently to 'https://docs.microsoft.com/en-us/azure/postgresql/single-server/how-to-read-replicas-portal'
This one was reasonably straightforward, but the link was to an anchor, so I had to visit the page to ensure the anchor was still valid.
- public/ee/administration/geo/replication/object_storage.html: https://cloud.google.com/storage-transfer/docs/: link has moved permanently to 'https://cloud.google.com/storage-transfer/docs/overview'
I'm not seeing the same response. I wonder if this is false positive?
- public/ee/administration/integration/mailgun.html: https://app.mailgun.com/app/account/security/api_keys: link has moved permanently to 'https://login.mailgun.com/login' https://www.mailgun.com/blog/a-guide-to-using-mailguns-webhooks/: link has moved permanently to 'https://www.mailgun.com/blog/product/a-guide-to-using-mailguns-webhooks/'
- public/ee/administration/monitoring/performance/grafana_configuration.html: https://grafana.com/docs/grafana/latest/reference/export_import/: link has moved permanently to 'http://grafana.com/docs/grafana/v7.5/dashboards/export-import/' https://grafana.com/docs/grafana/latest/installation/: link has moved permanently to 'http://grafana.com/docs/grafana/next/setup-grafana/installation/'
Here, following the links did indeed redirect to the addresses reported BUT the first was a specific v7.5 doc, which I didn't think we wanted, so followed another link through to the latest doc to get the correct URL.
Similarly, the latter link was to the "next" version docs, but we wanted the latest version, so next needed updating with latest.
- public/ee/administration/monitoring/performance/performance_bar.html: https://developers.google.com/web/fundamentals/performance/critical-rendering-path/measure-crp: link has moved permanently to 'https://web.dev/critical-rendering-path-measure-crp/'
I had to google this one to find the new URL.
Google turned up something similar looking, but "we" don't think it is quite the same thing, so have removed the link.
- public/ee/administration/postgresql/external.html: https://docs.microsoft.com/en-us/azure/postgresql/howto-create-users: link has moved permanently to 'https://docs.microsoft.com/en-us/azure/postgresql/single-server/how-to-create-users'
The redirect was correct, and I just had to check the anchor was still valid.
- public/ee/administration/raketasks/github_import.html: https://docs.github.com/en/rest/reference/rate-limit: link has moved permanently to 'https://docs.github.com/en/rest/rate-limit'
This was a simple switch.
- public/ee/administration/raketasks/storage.html: https://support.gitlab.com: link has moved permanently to 'https://support.gitlab.com/hc'
I'm unsure whether we should update this one or now (we may even want to consider adding support.gitlab.com to our exclusion list? My concern is that the redirection is based on the client (e.g. I get taken to https://support.gitlab.com/hc/en-us
Also, if you go to this issue, search for Check for broken external links. and read some of the info there, to see if it helps.
I do like the idea of putting it in batches! One other idea might be to sort it by page where the broken link is, if it's not already. That would be going above and beyond though.
@marcel.amirault You know more about our broken links than I do. Would you be able to help Lee with some of his review questions?
I think using batches is great, but the 403's and 404's can be a little challenging. As a first step, how about we create batches of just the links that are redirecting? These links are really easy to fix, and also really easy for the TW team to review. Taking care of those would also drastically shrink the output of the link test job, which will make it easier for us to address the harder links. WDYT?
We can see there is already some ruby in the project, so we could leverage ruby to make life easier santizing/preparing the output.
We could then create the issue directly, either using the gitlab ruby gem or as we probably only need to call one endpoint, a raw http request.
Do you think it's worth us moving forward with this, or is it just as easy for someone to run the CI job or nanoc task in GitPod every few weeks and spin up an issue by hand?
Do you think it's worth us moving forward with this, or is it just as easy for someone to run the CI job or nanoc task in GitPod every few weeks and spin up an issue by hand?
Yes please. Any effort to automate these monthly tasks, like the ones listed here, especially to create issues for the community, would be greatly appreciated.
@sselhorn In general we're moving towards using more JavaScript and less Ruby in this project. The proposed next version of the site will use a JavaScript site generator instead of a Ruby one, and so our supporting scripts will also be JavaScript/node.js.
The existing link checker comes from nanoc (the Ruby site generator). The next version of the site will not use nanoc, and so it will also not use this link checker.
I would advise against investing time in nanoc-specific features and instead recommend that we try a different tool for flagging broken links. We can make this change on the current site, with little risk (especially since our current link checker is kinda bad). I'd imagine a more modern tool will provide us with better output out-of-the-box. Just doing a quick search, this one looks nice: https://github.com/stevenvachon/broken-link-checker#readme
Just for context, how far are we from not using nanoc for the site generation? I tried to have a quick look to see try and see that but I couldn't really tell
Anyway, even if we stick to nanoc, we should be able to replace it only for the test that checks for external links by using a JS tool like the one you suggested and running it against the generated site pages, right? Thanks basically what nanoc does now if I understood it right
Hey @zillemarco! I'd guess the move away from nanoc will take several months, but we haven't gotten into it enough to have a solid timeline yet. We're working on a few other things and also waiting for the release of nuxt.js version 3. If you're curious, there is a proof-of-concept of docs on nuxt over here: https://gitlab.com/oregand/gitlab-docs-v2
You're right about the nanoc link checker -- it's only used on that job that runs the test. There are (minimal) docs for it here: https://nanoc.app/doc/testing/
I'd be happy to help review MRs or test a JavaScript solution for this, feel free to ping me whenever. But if sticking with nanoc makes more sense for now, that's fine too, I'd just recommend keeping custom code minimal.
But if sticking with nanoc makes more sense for now, that's fine too, I'd just recommend keeping custom code minimal.
Let's see what @leetickett also thinks about this, but I think that if, in the end, we'll move towards removing the Ruby code from the docs, we might as well start from somewhere Replacing Ruby on one part of the pipeline, which seems to be "quite" unrelated from the actual site, might be worth taking a look I mean, if it doesn't complicate things too much
There has been some anchor-related work on the tool recently: https://github.com/tcort/markdown-link-check/releases, but I haven't circled back to check if anchors work properly now. Given I originally started off with 3.9.0, there's a chance it works now.
A benefit of this tool is also a downside. It's really great to be able to run a link checker on source Markdown, because you don't need any sort of HTML transformation first. Of course, links not contained in Markdown are missed. For example, links in: https://gitlab.com/gitlab-org/gitlab-docs/-/blob/e777f2d81f6b17110e1775bc53f6304d4a953fae/content/index.erb would be missed. gitlab-development-kit docs aren't transformed and so this wasn't a problem for that project.
@leetickett I moved this issue into the main GitLab repo so it shows up in the list for contributors. I think it makes sense to have it in that repo, since all the fixes will be in that repo (for the most part).
@leetickett I've been poking at a few of these broken links as part of the monthly maintenance tasks. Some are an easy fix (like the missing trailing slash), and others need more digging since the target has been rewritten, moved, deleted, no longer valid/supported etc.
As of today I've got 2 MRs for this and probably will spin up a couple more soon: