Markdown filter: Text with inline HTML tags becomes fragmented translated units
*Created by: Kuro Kurosaka *
Markdown allows embedded HTML element such as:
Let's throw in a <b>tag Translatable</b> to see what happen Translatable
or
<a href="http://www.youtube.com/watch?feature=player_embedded&v=YOUTUBE_VIDEO_ID_HERE" target="_blank"><img src="http://img.youtube.com/vi/YOUTUBE_VIDEO_ID_HERE/0.jpg" alt="IMAGE ALT TEXT HERE" width="240" height="180" border="10" /></a>
Each of these should generate one trans-unit in XLIFF when extracted, but in reality they end up with multiple fragmented trans-units.
First sample becomes 5 trans-units (and only 3 contain actually translatable text):
Let's throw in a<bx id="1"/>tag Translatable<ex id="1"/>to see what happen Translatable
The second sample becomes 4 trans-units (with only 1 contains translatable text):
<bx id="1"/>IMAGE ALT TEXT HERE<x id="1"/><ex id="1"/>
This is likely because of the use of HTML subfilter to process HTML inline elements and more care need to be taken when merging the events from the HTML subfilter.