Skip to content

Do not interpret escaped markdown as shortcuts (like `@` and `#`)

What does this MR do?

Feature Flag: honor_escaped_markdown

My initial take:

We should honor the Backslash Escaping rules in the CommonMark Spec. This means that punctuation preceded by a backslash, such as \#, should not trigger any of our special references.

The difficulty is that we do the markdown processing straight out the the gate. Which means \# is properly converted to #. By the time it reaches our reference parsing, it's just a #, which looks valid to us.

Here are a few ideas on solving this

  • When you take a look at how the AST is built, https://spec.commonmark.org/dingus/?text=%231%0A%0A%5C%231%0A%0A, a \# gets translated into it's own <text>#</text> node. We might be able to walk the AST and replace such a node with the actual HTML entity, <text>&num;</text>, indicating that it should be a literal. Unfortunately, our Epcis reference, &, gets converted by markdown into &amp;, which we search for. So it won't work for everything. But could use the Unicode entity instead, &#x00023;. This would be ignored by our reference parsing code.

    This relies on the parser building a separate text node for literals, but I would be surprised if this changed, and tests would verify it for us.

    This does not work with the C version of the CommonMark parser, so it's not a valid option

  • Could try to move the reference parsing before processing markdown. However, we rely on the nokogiri parsed HTML in order to ensure we're not picking up references in code block, etc. I don't think this good solution.

  • Fork cmark-gfm and write a C extension that would write out the proper entity when processing a backslash. Unless it was was accepted upstream, then we would also probably have to fork https://github.com/gjtorikian/commonmarker.

  • Get support added upstream: https://github.com/commonmark/cmark/issues/366

As far as I can tell, the key seems to be marking a backslashed item in some way that our code knows the difference and can act accordingly.

I think the first option is probably the best.

The other enhancement (probably in a separate MR) would be detecting the backslash in the UI when typing, and not triggering the reference menu there.


Final Implementation

We now add a pre-filter Filter::MarkdownPreEscapeFilter and a post-filter Filter::MarkdownPostEscapeFilter to process the escapes. From the code:

In order to allow a user to short-circuit our reference shortcuts (such as # or !), the user should be able to escape them, like #. CommonMark supports this, however it removes all information about what was actually a literal. In order to short-circuit the reference, we must surround backslash escaped ASCII punctuation with a custom sequence. This way CommonMark will properly handle the backslash escaped chars but we will maintain knowledge (the sequence) that it was a literal.

We need to surround the character, not just prefix it. It could get converted into an entity by CommonMark and we wouldn't know how many characters there are. The entire literal needs to be surrounded with a span tag, which short-circuits our reference processing.

We can't use a custom HTML tag since we could be initially surrounding text in an href, and then CommonMark will not be able to parse links properly. So we use and

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Related to #16827 (closed)

Edited by Brett Walker

Merge request reports