Remove Rinku from `AutolinkFilter`
As described in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/17180 we use Rinku for http, mailto and ftp links. We extract links manually for other protocols - protocols that are not supported by Rinku.
According to @rspeicher Rinku was introduced due to performance before we started caching html fields.
Problem with Rinku
When the Rinku is used text of some links is not parsed correctly while when we do it manually it is.
Example:
<rdar://link><another>
extracts
#(Element:0x3ffa4e968808 {
name = "a",
attributes = [ #(Attr:0x3ffa4e9687cc { name = "href", value = "rdar://link" })],
children = [ #(Text "rdar://link")]
})
as the a
element. This is done by https://gitlab.com/gitlab-org/gitlab-ce/blob/60d0568f8515c3ff18d693329087fe0f30ac9611/lib/banzai/filter/autolink_filter.rb#L120
However for http links this method is not entered because everything is done automatically by Rinku in https://gitlab.com/gitlab-org/gitlab-ce/blob/60d0568f8515c3ff18d693329087fe0f30ac9611/lib/banzai/filter/autolink_filter.rb#L65 . This results into a
element generated differently:
<http://link><another>
extracts
#(Element:0x3fdccbe6c890 {
name = "a",
attributes = [ #(Attr:0x3fdccbe6c868 { name = "href", value = "http://link><another" })],
children = [ #(Text "http://link><another")]
})
Possible issues when removing Rinku
We use Gitlab::StringRegexMarker
for matching other links. But this class supports matching only 1 occurrence in a string. This is not a new issue but would cause much bigger problem for http (eg. only one reference to an issue referenced by url extracted).