Do not normalize canonical URLs
We had implemented canonical URLs in !211 (merged) a long time ago (see layouts/canonical_urls.html
), so that we don't end up with duplicate URLs in Google searches with versioned docs. However, it turns out they don't work on versioned docs.
This has been broken due to the find and replace the post-build script runs. This means that for the versioned docs, all canonical URLs are replaced.
We need to exclude the canonical URL occurrences from sed
:
If you are doing a substitution with
sed
but need to exclude a specific line or pattern, that can be accomplished by prefixing an exclusion and using!
.
This needs to be configured before the sed
replacement, thus we configure it in each -e
occurrence.
Related issues
Closes #1568 (closed).
sed
in an example
Test Replace any occurrences starting with ="https://docs.gitlab.com/ee/
with ="/15.10/ee/
, but exclude those that have rel="canonical
or property="og:url
in their name.
sed -e '/\(rel="canonical\|property="og:url\)/! s#="https://docs.gitlab.com/ee/#="/15.10/ee/#g' -e '/\(rel="canonical\|property="og:url\)/! s#="https://docs.gitlab.com/omnibus/#="/15.10/omnibus/#g' <<EOF
<link rel="canonical" href="https://docs.gitlab.com/ee/api/index.html" />
<meta property="og:url" content="https://docs.gitlab.com/ee/api/index.html" />
href="https://docs.gitlab.com/ee/ci/yaml/includes.html"
href="https://docs.gitlab.com/omnibus/architecture/"
EOF
You should see the following:
<link rel="canonical" href="https://docs.gitlab.com/ee/api/index.html" />
<meta property="og:url" content="https://docs.gitlab.com/ee/api/index.html" />
href="/15.10/ee/ci/yaml/includes.html"
href="15.10/omnibus/architecture/"
Test in the docs site
-
Check out the branch.
-
Remove
public
and rebuild the site. We need to build the production site to populate the canonical URLs:make clean && NANOC_ENV=production make compile
-
Copy
public
to a version:cp -a public 15.10
-
Run the script:
scripts/normalize-links.sh . 15.10
-
Open a few HTML files that contain external links and verify that:
-
The following lines exist:
<link rel="canonical" href="https://docs.gitlab.com/......" /> <meta property="og:url" content="https://docs.gitlab.com/....." />
-
There's no occurrences of
href="https://docs.gitlab.com
to the rest of the files.
For example, open:
diff 15.10/ee/raketasks/backup_gitlab.html public/ee/raketasks/backup_gitlab.html
public
should contain links tohttps://docs.gitlab.com/omnibus/
, and in15.10
those should have been replaced by/15.10/omnibus/
. The canonical URLs should be identical to both and not shown in diff:diff 15.10/ee/raketasks/backup_gitlab.html public/ee/raketasks/backup_gitlab.html | grep canonical diff 15.10/ee/raketasks/backup_gitlab.html public/ee/raketasks/backup_gitlab.html | grep og:url
Test another file, it should yield similar results:
diff 15.10/omnibus/settings/backups.html public/omnibus/settings/backups.html diff 15.10/omnibus/settings/backups.html public/omnibus/settings/backups.html | grep canonical diff 15.10/omnibus/settings/backups.html public/omnibus/settings/backups.html | grep og:url
-