Markdown filter: Text with inline HTML tags becomes fragmented translated units

*Created by: Kuro Kurosaka *

Markdown allows embedded HTML element such as:

Let's throw in a <b>tag Translatable</b> to see what happen Translatable

or

<a href="http://www.youtube.com/watch?feature=player_embedded&v=YOUTUBE_VIDEO_ID_HERE" target="_blank"><img src="http://img.youtube.com/vi/YOUTUBE_VIDEO_ID_HERE/0.jpg" alt="IMAGE ALT TEXT HERE" width="240" height="180" border="10" /></a>

Each of these should generate one trans-unit in XLIFF when extracted, but in reality they end up with multiple fragmented trans-units.

First sample becomes 5 trans-units (and only 3 contain actually translatable text):

  1. Let's throw in a
  2. <bx id="1"/>
  3. tag Translatable
  4. <ex id="1"/>
  5. to see what happen Translatable

The second sample becomes 4 trans-units (with only 1 contains translatable text):

  1. <bx id="1"/>
  2. IMAGE ALT TEXT HERE
  3. <x id="1"/>
  4. <ex id="1"/>

This is likely because of the use of HTML subfilter to process HTML inline elements and more care need to be taken when merging the events from the HTML subfilter.

Assignee Loading
Time tracking Loading