Skip to content

doc: Fix URLs broken by master branches being renamed to main

What does this MR do?

Some projects have renamed their "master" branches to "main" and such URLs need to be updated. I use the bash script below to map the broken URLs and their source files which uses typical shell commands except for GNU Parallel which you may need to install. GNU Parallel speeds up checking the 1017 URLs with curl. The script finds 18 URLs are unreachable, but some of them by design are unreachable either because they point to localhost, have variables in them, or have POST values that actually are reachable but curl fails and therefore generates false negatives. I have fixed all the URLs I could. Note that I've cloned this repository to ~/src/gitlab/ and the script is being run from a directory ~/src/fix-gitlab-master-links/.

# File: extract-and-check-urls.bash
SRC=~/src/gitlab
WD=$PWD

log () {
    # shellcheck disable=SC1117
    printf "[%3d] $1\n" $SECONDS
}

# Generate a list of URLs containing the word "master" which may have
# been changed to the "main" branch, and include the file path in
# which that URL was found.
sentinel="$WD/.extract_done"
output="$WD/file_url.txt"
if ! [[ -f "$sentinel" ]]; then
    log "Extracting URLs containing the word 'master' ..."
    # Use subshell in case cd fails.
    (
	cd "$SRC" || exit
	# Find URLs per https://unix.stackexchange.com/a/181264
	find . \
	     -type f \
	     -exec grep -Eo '\((http|https)://[^\)]+master[^\)]+\)' {} + \
	     > "$output"
	# We can't check exit code, because grep will produce non-zero
	# exit codes for non-matching files.  Therefore check if the
	# output file has non-zero size.
	test -s "$output" && touch "$sentinel"
    )
else
    log "Found existing $(basename "$output") containing the word 'master'"
fi

# Split file paths from URLs, and remove flanking parantheses from URLs.
sentinel="$WD/.split_done"
input="$output"
files="$WD/file.txt"
urls="$WD/url.txt"
cut -d: -f1  < "$input" > "$files"
cut -d: -f2- < "$input" | sed -E -e 's#^.##' -e 's#.$##' > "$urls"

# Use GNU parallel and curl to check that the URLs are valid.
sentinel="$WD/.curl_done"
output="$WD/url_checks.txt"
if ! [[ -f "$sentinel" ]]; then
    log "Checking whether URLs are valid ..."
    (
	cd "$SRC" || exit
	# Verify URL per https://unix.stackexchange.com/a/475067
	parallel --keep-order --joblog "$output" \
		 curl --head --silent --fail :::: "$urls"
	test -s "$output" && touch "$sentinel"
    )
else
    log "Found existing $(basename "$output") of URL checks"
fi

log "Done!"
count=$(awk '$7 != 0' "$output" | tail -n +2 | wc -l)
log "Found $count broken links:"
input=$output
output="$WD/linenos.txt"
awk '$7 != 0 {print $1}' "$input" | tail -n +2 | sed 's#^#NR==#' > "$output"
input=$output
paste \
       <( awk -f "$input" "$WD/file.txt") \
       <( awk -f "$input" "$WD/url.txt") \
    | cat <( echo FILE BROKEN_URL )  - | column -t

Output of second (memoized) run:

omsai@xm1:~/src/fix-gitlab-master-links$ bash extract-and-check-urls.bash
[  0] Found existing file_url.txt containing the word 'master'
[  0] Found existing url_checks.txt of URL checks
[  0] Done!
[  0] Found 18 broken links:
FILE                                                                   BROKEN_URL
./lib/gitlab/golang.rb                                                 https://github.com/golang/go/blob/master/src/cmd/go/internal/modfetch/pseudo.go
./ee/spec/services/ee/issues/build_from_vulnerability_service_spec.rb  http://localhost/#{project.full_path}/-/blob/master/maven/src/main/java/com/gitlab/security_products/tests/App.java#L29
./doc/development/documentation/site_architecture/index.md             https://gitlab.com/gitlab-org/gitlab-docs/-/tree/master/dockerfiles
./doc/development/documentation/testing.md                             https://gitlab.com/gitlab-org/gitlab-development-kit/-/tree/master/doc/.vale/gitlab
./doc/development/documentation/testing.md                             https://gitlab.com/search?utf8=✓&snippets=false&scope=&repository_ref=master&search=path%3Adoc%2F.vale%2Fgitlab+Suggestion%3A&group_id=9970&project_id=278964
./doc/development/documentation/testing.md                             https://gitlab.com/search?utf8=✓&snippets=false&scope=&repository_ref=master&search=path%3Adoc%2F.vale%2Fgitlab+Warning%3A&group_id=9970&project_id=278964
./doc/development/documentation/testing.md                             https://gitlab.com/search?utf8=✓&snippets=false&scope=&repository_ref=master&search=path%3Adoc%2F.vale%2Fgitlab+Error%3A&group_id=9970&project_id=278964
./doc/development/i18n/merging_translations.md                         https://gitlab.com/gitlab-org/gitlab/-/branches/all?utf8=✓&search=master-i18n
./doc/development/windows.md                                           https://gitlab.com/gitlab-org/ci-cd/shared-runners/images/gcp/windows-containers/blob/master/cookbooks/preinstalled-software/README.md
./doc/development/windows.md                                           https://gitlab.com/gitlab-org/ci-cd/shared-runners/images/gcp/windows-containers/-/blob/master/packer.json#L2-10
./doc/development/code_intelligence/index.md                           https://github.com/sourcegraph/sourcegraph/blob/master/doc/user/code_intelligence/writing_an_indexer.md
./doc/development/auto_devops.md                                       https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/ci/templates/Auto-DevOps.gitlab-ci.yml
./doc/development/architecture.md                                      https://gitlab.com/gitlab-org/gitlab-runner/blob/master/README.md
./doc/user/application_security/dast/index.md                          https://gitlab.com/gitlab-org/security-products/dast/-/tree/master/test/end-to-end/expect
./doc/user/application_security/dependency_scanning/index.md           https://gitlab.com/gitlab-org/security-products/release/blob/master/docs/release_process.md
./doc/ci/runners/build_cloud/windows_build_cloud.md                    https://gitlab.com/gitlab-org/ci-cd/custom-executor-drivers/autoscaler/tree/master/docs/readme.md
./doc/ci/runners/build_cloud/windows_build_cloud.md                    https://gitlab.com/gitlab-org/ci-cd/shared-runners/images/gcp/windows-containers/blob/master/cookbooks/preinstalled-software/README.md
./spec/features/markdown/copy_as_gfm_spec.rb                           https://gitlab.com/gitlab-org/gitlab-foss/badges/master/coverage.svg?job=coverage

Related issues

None?

Author's checklist

If you are only adding documentation, do not add any of the following labels:

  • ~"feature"
  • ~"frontend"
  • ~"backend"
  • ~"bug"
  • ~"database"

These labels cause the MR to be added to code verification QA issues.

Review checklist

Documentation-related MRs should be reviewed by a Technical Writer for a non-blocking review, based on Documentation Guidelines and the Style Guide.

  • If the content requires it, ensure the information is reviewed by a subject matter expert.
  • Technical writer review items:
    • Ensure docs metadata is present and up-to-date.
    • Ensure the appropriate labels are added to this MR.
    • If relevant to this MR, ensure content topic type principles are in use, including:
      • The headings should be something you'd do a Google search for. Instead of Default behavior, say something like Default behavior when you close an issue.
      • The headings (other than the page title) should be active. Instead of Configuring GDK, say something like Configure GDK.
      • Any task steps should be written as a numbered list.
      • If the content still needs to be edited for topic types, you can create a follow-up issue with the docs-technical-debt label.
  • Review by assigned maintainer, who can always request/require the above reviews. Maintainer's review can occur before or after a technical writer review.
  • Ensure a release milestone is set.

Merge request reports

Loading