SanitizeNodeLink handles URLs that can't decode to valid UTF-8
What does this MR do and why?
SanitizeNodeLink parses and normalises the target URI before determining if it looks safe or not. If this normalisation fails, we consider it invalid (and remove according to remove_invalid_links).
SanitizeLinkFilter raises an error when normali... (#601088 - closed) noted that the normalisation process may raise ArgumentError if there are bytes in the URI that produce a badly-encoded string; i.e. http://www.gitlab%b3.com will parse with a hostname of www.gitlab\xb3.com, which isn't valid UTF-8.
We now handle that ArgumentError, matching on the message.
How to set up and validate locally
-
Try typing
www.gitlab%b3.cominto a plain-text editor Markdown field in your local GDK and click preview. No preview appears; instead this does, at the top of the page: -
Check this branch out, wait for code reload.
-
Try again. It previews correctly:
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

