Skip to content

Fix HTML inline elements processing in the client-side Markdown parser

What does this MR do and why?

It fixes a bug that occurs while parsing Markdown documents with HTML inline elements that are direct children of the root document or ProseMirror nodes that do not support inline elements as direct children. The Content Editor uses a library called ProseMirror that enforces a strict structure in rich text documents via a "document schema". Three of the rules enforced by the Schema are:

  • The root element of the document can’t have inline content like text or images as direct children. Text and images should be wrapped in a paragraph.
  • The above condition is also true for tables and lists.

The problem is that, in HTML, text and images are allowed inside tables, lists, and the root document (a concept that is a little bit fuzzy in HTML).

How does this MR fix the bug?

We fix this problem by detecting inline elements that violate the document schema restrictions, and we wrap them in a paragraph 🤷. If the Markdown parser produces an AST with the following shape:

{
  "type": "root",
  "children": [
    {
      "type": "html",
      "value": "<img src=\"bar\" title=\"foo\">"
    }
  ],
}
{
  "type": "root",
  "children": [
    {
      "type": "paragraph",
      "children": [
        {
           "type": "html",
           "value": "<img src=\"bar\" title=\"foo\">"
        }
      ]
    }
  ],
}

The algorithm follows these steps:

  • It detects an inline element.
  • It creates a paragraph node if and only if the inline element’s previous sibling is not a paragraph node.
  • It adds the inline element to the paragraph node.

Screenshots or screen recordings

2022-06-01_11.12.22

How to set up and validate locally

  • Enable the preserve_unchanged_markdown feature flag.

  • Update or create a Wiki page

  • Enter the following Markdown snippet in the Classic Markdown Editor

    - List item with an image ![bar](foo.png)
    
    <i class="foo">
      *bar*
    </i>
    
    <img src="bar" alt="foo" />
    
  • Switch to "Rich text" mode

  • The elements should be rendered correctly (even though the images won’t load)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Enrique Alcántara

Merge request reports