What guidance can we give for tackling difference errors? Maybe just tackle the obvious ones then we can re-run and look more carefully at the tricky ones?
Or ask for contributors to call out any challenging ones?
Latest list of 302s as of 27/07/22
Charts (WIP MR)
public/charts/charts/globals.html: [ ERROR ] external_links - broken reference to https://www.haproxy.com/blog/haproxy/proxy-protocol/: link has moved permanently to 'http://www.haproxy.com/blog/use-the-proxy-protocol-to-preserve-a-clients-ip-address/'
public/operator/adr/0001-record-architecture-decisions.html: [ ERROR ] external_links - broken reference to https://thinkrelevance.com/blog/2011/11/15/documenting-architecture-decisions: link has moved permanently to 'https://cognitect.com/blog/2011/11/15/documenting-architecture-decisions'
Runner (WIP MR)
public/runner/executors/custom_examples/libvirt.html: [ ERROR ] external_links - broken reference to https://docs.gitlab.com/ee/ssh/: link has moved permanently to 'https://docs.gitlab.com/ee/user/ssh.html' public/runner/executors/kubernetes.html: [ ERROR ] external_links - broken reference to https://kubernetes.io/docs/setup/production-environment/windows/intro-windows-in-kubernetes/: link has moved permanently to 'https://kubernetes.io/docs/concepts/windows/intro/' public/runner/install/windows.html: [ ERROR ] external_links - broken reference to https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.diagnostics/get-winevent?view=powershell-7.1: link has moved permanently to 'https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.diagnostics/get-winevent?view=powershell-7.2&viewFallbackFrom=powershell-7.1'
@sselhorn i've started putting this together... my thinking is contributors can "claim" a batch at a time?
We just need to provide a few pointers. I wonder if I maybe try and attack batch 1 if I will find enough variety to comment on the different scenarios?
Here's my attempt at thinking out loud while reviewing the 1st batch. Can I/we boil this down to some basic steps/suggestions/guidelines?
- public/charts/charts/globals.html: https://www.haproxy.com/blog/haproxy/proxy-protocol/: link has moved permanently to 'http://www.haproxy.com/blog/use-the-proxy-protocol-to-preserve-a-clients-ip-address/'
I've skipped this one for the moment as it's in a different repo. I think just this one slipped through?
This was a nice perplexing one... Hitting the URL did indeed return a 404, but still loaded the page AOK.
It wasn't immediately apparent what was going on, so I found the gitlab link on the navigation menu and followed it. Very subtle... but the link was missing the trailing forward slash... e.g. https://casdoor.org/docs/integration/gitlab
- public/ee/administration/geo/disaster_recovery/index.html: https://docs.microsoft.com/en-us/azure/postgresql/howto-read-replicas-portal: link has moved permanently to 'https://docs.microsoft.com/en-us/azure/postgresql/single-server/how-to-read-replicas-portal'
This one was reasonably straightforward, but the link was to an anchor, so I had to visit the page to ensure the anchor was still valid.
- public/ee/administration/geo/replication/object_storage.html: https://cloud.google.com/storage-transfer/docs/: link has moved permanently to 'https://cloud.google.com/storage-transfer/docs/overview'
I'm not seeing the same response. I wonder if this is false positive?
- public/ee/administration/integration/mailgun.html: https://app.mailgun.com/app/account/security/api_keys: link has moved permanently to 'https://login.mailgun.com/login' https://www.mailgun.com/blog/a-guide-to-using-mailguns-webhooks/: link has moved permanently to 'https://www.mailgun.com/blog/product/a-guide-to-using-mailguns-webhooks/'
- public/ee/administration/monitoring/performance/grafana_configuration.html: https://grafana.com/docs/grafana/latest/reference/export_import/: link has moved permanently to 'http://grafana.com/docs/grafana/v7.5/dashboards/export-import/' https://grafana.com/docs/grafana/latest/installation/: link has moved permanently to 'http://grafana.com/docs/grafana/next/setup-grafana/installation/'
Here, following the links did indeed redirect to the addresses reported BUT the first was a specific v7.5 doc, which I didn't think we wanted, so followed another link through to the latest doc to get the correct URL.
Similarly, the latter link was to the "next" version docs, but we wanted the latest version, so next needed updating with latest.
- public/ee/administration/monitoring/performance/performance_bar.html: https://developers.google.com/web/fundamentals/performance/critical-rendering-path/measure-crp: link has moved permanently to 'https://web.dev/critical-rendering-path-measure-crp/'
I had to google this one to find the new URL.
Google turned up something similar looking, but "we" don't think it is quite the same thing, so have removed the link.
- public/ee/administration/postgresql/external.html: https://docs.microsoft.com/en-us/azure/postgresql/howto-create-users: link has moved permanently to 'https://docs.microsoft.com/en-us/azure/postgresql/single-server/how-to-create-users'
The redirect was correct, and I just had to check the anchor was still valid.
- public/ee/administration/raketasks/github_import.html: https://docs.github.com/en/rest/reference/rate-limit: link has moved permanently to 'https://docs.github.com/en/rest/rate-limit'
This was a simple switch.
- public/ee/administration/raketasks/storage.html: https://support.gitlab.com: link has moved permanently to 'https://support.gitlab.com/hc'
I'm unsure whether we should update this one or now (we may even want to consider adding support.gitlab.com to our exclusion list? My concern is that the redirection is based on the client (e.g. I get taken to https://support.gitlab.com/hc/en-us
Also, if you go to this issue, search for Check for broken external links. and read some of the info there, to see if it helps.
I do like the idea of putting it in batches! One other idea might be to sort it by page where the broken link is, if it's not already. That would be going above and beyond though.
@marcel.amirault You know more about our broken links than I do. Would you be able to help Lee with some of his review questions?
I think using batches is great, but the 403's and 404's can be a little challenging. As a first step, how about we create batches of just the links that are redirecting? These links are really easy to fix, and also really easy for the TW team to review. Taking care of those would also drastically shrink the output of the link test job, which will make it easier for us to address the harder links. WDYT?
We can see there is already some ruby in the project, so we could leverage ruby to make life easier santizing/preparing the output.
We could then create the issue directly, either using the gitlab ruby gem or as we probably only need to call one endpoint, a raw http request.
Do you think it's worth us moving forward with this, or is it just as easy for someone to run the CI job or nanoc task in GitPod every few weeks and spin up an issue by hand?
Do you think it's worth us moving forward with this, or is it just as easy for someone to run the CI job or nanoc task in GitPod every few weeks and spin up an issue by hand?
Yes please. Any effort to automate these monthly tasks, like the ones listed here, especially to create issues for the community, would be greatly appreciated.
@sselhorn In general we're moving towards using more JavaScript and less Ruby in this project. The proposed next version of the site will use a JavaScript site generator instead of a Ruby one, and so our supporting scripts will also be JavaScript/node.js.
The existing link checker comes from nanoc (the Ruby site generator). The next version of the site will not use nanoc, and so it will also not use this link checker.
I would advise against investing time in nanoc-specific features and instead recommend that we try a different tool for flagging broken links. We can make this change on the current site, with little risk (especially since our current link checker is kinda bad). I'd imagine a more modern tool will provide us with better output out-of-the-box. Just doing a quick search, this one looks nice: https://github.com/stevenvachon/broken-link-checker#readme
Just for context, how far are we from not using nanoc for the site generation? I tried to have a quick look to see try and see that but I couldn't really tell
Anyway, even if we stick to nanoc, we should be able to replace it only for the test that checks for external links by using a JS tool like the one you suggested and running it against the generated site pages, right? Thanks basically what nanoc does now if I understood it right
Hey @zillemarco! I'd guess the move away from nanoc will take several months, but we haven't gotten into it enough to have a solid timeline yet. We're working on a few other things and also waiting for the release of nuxt.js version 3. If you're curious, there is a proof-of-concept of docs on nuxt over here: https://gitlab.com/oregand/gitlab-docs-v2
You're right about the nanoc link checker -- it's only used on that job that runs the test. There are (minimal) docs for it here: https://nanoc.app/doc/testing/
I'd be happy to help review MRs or test a JavaScript solution for this, feel free to ping me whenever. But if sticking with nanoc makes more sense for now, that's fine too, I'd just recommend keeping custom code minimal.
But if sticking with nanoc makes more sense for now, that's fine too, I'd just recommend keeping custom code minimal.
Let's see what @leetickett also thinks about this, but I think that if, in the end, we'll move towards removing the Ruby code from the docs, we might as well start from somewhere Replacing Ruby on one part of the pipeline, which seems to be "quite" unrelated from the actual site, might be worth taking a look I mean, if it doesn't complicate things too much
There has been some anchor-related work on the tool recently: https://github.com/tcort/markdown-link-check/releases, but I haven't circled back to check if anchors work properly now. Given I originally started off with 3.9.0, there's a chance it works now.
A benefit of this tool is also a downside. It's really great to be able to run a link checker on source Markdown, because you don't need any sort of HTML transformation first. Of course, links not contained in Markdown are missed. For example, links in: https://gitlab.com/gitlab-org/gitlab-docs/-/blob/e777f2d81f6b17110e1775bc53f6304d4a953fae/content/index.erb would be missed. gitlab-development-kit docs aren't transformed and so this wasn't a problem for that project.
@leetickett I moved this issue into the main GitLab repo so it shows up in the list for contributors. I think it makes sense to have it in that repo, since all the fixes will be in that repo (for the most part).
@leetickett I've been poking at a few of these broken links as part of the monthly maintenance tasks. Some are an easy fix (like the missing trailing slash), and others need more digging since the target has been rewritten, moved, deleted, no longer valid/supported etc.
As of today I've got 2 MRs for this and probably will spin up a couple more soon:
@measutosh If you're interested in working on some doc fixes, you might mention @leetickett on this issue and choose one of the batches of broken links above to work on. Finding the correct links is not easy, but is always appreciated!
Thanks Suzanne for the suggestions.
Hey @leetickett , I would like to be part of this issue. Went through the comments, had a hard time to understand how to work on it as I am new to this.
Could you please help me out and lead me to one of the simple ones from the batches.
I just updated the list in the issue description to only include pages which have moved. These are the simplest to address.
If you can focus on the GitLab list. Maybe pick the first 10 - 20 pages.
Each line tells you the old link and new link. You should do your best to confirm the new link content does actually still seem correct (based on the context of the link)- but fingers crossed the majority will be spot on.
@leetickett, definitely I would need some guidance here. As per my understanding I have tried to fix one link from the GitLab group(see this one #93724).
Please have a look, tell me if I am going in the right path or not.
If yes then I'll start working on the rest, if not then please tell me where I am wrong.
does this count as a contribution in the hackathon?
after this can I continue with rest of the links?
could you please update the path to the files like you did above for the plan_your_upgrade.md file, because it becomes difficult to find the path to the file just from it's name and few prefixes in the path
Hi @leetickett , I could see most of the broken links are WIP MR, apart from them if any other broken link issue is there, then I would like to pick it up
Hey @gitlab-bot, @leetickett, it appears this issue does not meet all of the required criteria for the quick win label, so it has been removed.
If you believe this issue is still relevant, ensure it meets the criteria for quick win issues
then re-add the label.
Note: Our next GitLab Hackathon commences on Thursday (2025-01-23).
Re-add the label before this date to increase the likelihood of your issue being picked up by a community contributor.
Please direct any questions to @gitlab-org/developer-relations/contributor-success.