Skip to content

Convert code attributes earlier

Brett Walker requested to merge bw-lang-attribute into master

What does this MR do and why?

Related to #385036 (closed)

Currently our markdown parser passes the language for code blocks as, for example, <pre lang="ruby". This is due to a special flag called GITHUB_PRE_LANG, and is not usually supported in other potential markdown parsers. In addition, the use of the lang attribute is not semantically correct, see discussion in #385036 (closed).

However we rely on that attribute to find certain entities in the DOM, such as math blocks, etc.

This MR refactors the language parsing out of the syntax filter and into its own filter at the beginning of the pipeline. This way we can set our own attribute, data-canonical-lang and use that for searching the DOM.

In order to break up and isolate changes, this is the first part, changing how the handling is done on the backend. In the syntax highlighting filter, we add back the lang field (which was being done anyway), in order for any frontend code to continue to work.

A future MR will tackle that piece.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Brett Walker

Merge request reports