Markdown filter: Text with inline HTML tags becomes fragmented translated units

*Created by: Kuro Kurosaka *

Markdown allows embedded HTML element such as:

Let's throw in a <b>tag Translatable</b> to see what happen Translatable

<a href="http://www.youtube.com/watch?feature=player_embedded&v=YOUTUBE_VIDEO_ID_HERE" target="_blank"><img src="http://img.youtube.com/vi/YOUTUBE_VIDEO_ID_HERE/0.jpg" alt="IMAGE ALT TEXT HERE" width="240" height="180" border="10" /></a>

Each of these should generate one trans-unit in XLIFF when extracted, but in reality they end up with multiple fragmented trans-units.

First sample becomes 5 trans-units (and only 3 contain actually translatable text):

Let's throw in a
<bx id="1"/>
tag Translatable
<ex id="1"/>
to see what happen Translatable

The second sample becomes 4 trans-units (with only 1 contains translatable text):

<bx id="1"/>
IMAGE ALT TEXT HERE
<x id="1"/>
<ex id="1"/>

This is likely because of the use of HTML subfilter to process HTML inline elements and more care need to be taken when merging the events from the HTML subfilter.