Fix heading anchor links opening in a new tab for non-ASCII headings

What does this MR do and why?

Fixes #602230 (closed). Manually verified on GDK-in-a-box.

After the fix, CJK heading anchors no longer have target="_blank", and clicking them navigates within the same page, like ASCII-only headings:

image

Root cause

  1. Comrak parser includes non-ASCII characters directly in the generated href:

    $ comrak --header-id-prefix 'user-content-' <<< '#### Reproduction Heading - 再現用の見出しです'
    <h4><a href="#reproduction-heading---再現用の見出しです" aria-hidden="true" class="anchor" id="user-content-reproduction-heading---再現用の見出しです"></a>Reproduction Heading - 再現用の見出しです</h4>

    The link then lands in Banzai::Filter::ExternalLinkFilter, where it's treated as external for the reasons that follow.

  2. URI.parse() on the non-ASCII fragment raises URI::InvalidURIError:

    $ ruby -e 'require "uri"; URI.parse("#reproduction-heading---再現用の見出しです")'
    /.../lib/uri/rfc3986_parser.rb:84:in `split': URI must be ascii only "#reproduction-heading---\u518D\u73FE\u7528\u306E\u898B\u51FA\u3057\u3067\u3059" (URI::InvalidURIError)
    ...
  3. addressable_uri becomes nil

  4. target="_blank" gets added since SCHEMES includes nil

Fix

Skip fragment-only links (where href starts with #) at the top of call. (URI::InvalidURIError never fires and addressable_uri never becomes nil.)

🔴 RED → 🟢 GREEN cycle:

  1. 70a311b0: Add tests for fragment-only links – 🔴 RED at this point due to the existing bug (job log)
  2. 9a1b30dc: Skip fragment-only links – 🟢 GREEN

Notes

I went with the narrowest change here. The check could just as well live in internal_url?, and there may be a broader fix that makes more sense. Open to suggestions.

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by skkzsh

Merge request reports

Loading