Fix heading anchor links opening in a new tab for non-ASCII headings
What does this MR do and why?
Fixes #602230 (closed). Manually verified on GDK-in-a-box.
After the fix, CJK heading anchors no longer have target="_blank",
and clicking them navigates within the same page, like ASCII-only headings:
Root cause
-
Comrak parser includes non-ASCII characters directly in the generated
href:$ comrak --header-id-prefix 'user-content-' <<< '#### Reproduction Heading - 再現用の見出しです' <h4><a href="#reproduction-heading---再現用の見出しです" aria-hidden="true" class="anchor" id="user-content-reproduction-heading---再現用の見出しです"></a>Reproduction Heading - 再現用の見出しです</h4>The link then lands in
Banzai::Filter::ExternalLinkFilter, where it's treated as external for the reasons that follow. -
URI.parse()on the non-ASCII fragment raisesURI::InvalidURIError:$ ruby -e 'require "uri"; URI.parse("#reproduction-heading---再現用の見出しです")' /.../lib/uri/rfc3986_parser.rb:84:in `split': URI must be ascii only "#reproduction-heading---\u518D\u73FE\u7528\u306E\u898B\u51FA\u3057\u3067\u3059" (URI::InvalidURIError) ...
Fix
Skip fragment-only links (where href starts with #) at the top of call.
(URI::InvalidURIError never fires and addressable_uri never becomes nil.)
- 70a311b0: Add tests for fragment-only links –
🔴 RED at this point due to the existing bug (job log) - 9a1b30dc: Skip fragment-only links –
🟢 GREEN
Notes
I went with the narrowest change here.
The check could just as well live in internal_url?, and there may be a broader fix that makes more sense.
Open to suggestions.
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
