XLIFFFilter: `<ph></ph>` are coded as two TagType.PLACEHOLDER with TextFragment.MARKER_ISOLATED

Created by: hao

Hi Okapi,

In short:

<ph></ph> is coded to two TagType.PLACEHOLDER with TextFragment.MARKER_ISOLATED, and it’s not possible to retain the OPEN/CLOSE pair tags from it when processing TextFragment.codes from Events.

Backgrounds:

we’re using Okapi to parse XLIFF, extract text and InlineElements(MARKER_OPENING/ MARKER_CLOSING Elements) and Markers(MARKER_ISOLATED), send them to our Machine-Translation Service, get the results back, merge back translation with InlineElements and Markers to Okapi.

Recently we find out an issue that xliff tag <ph></ph> are coded as two TagType.PLACEHOLDER but not a OPENING and CLOSING. We understand that <ph> stands for PLACEHOLDER, but it this case (), should it be an OPEN and CLOSE element?

Details:

Xliff:

<source>
    <g ctype="x-html-p" id="1" dgo:tag_name="p">
        <ph ctype="image" id="2" htm:src="B9BD5C75F6951B0.gif" htm:width="350" htm:height="350" htm:border="2px solid rgb(255, 0, 0)" htm:float="left" htm:margin="10px">
            <sub ctype="x-html-img-alt">display</sub>
        </ph>
    </g>
</source>
<target xml:lang="de-DE">
    <ph ctype="image" id="2" htm:src="B9BD5C75F6951B0.gif" htm:width="350" htm:height="350" htm:border="2px solid rgb(255, 0, 0)" htm:float="left" htm:margin="10px">
        <sub ctype="x-html-img-alt">
            <g ctype="x-html-p" id="1" dgo:tag_name="p">Anzeige</sub>
        </ph>
    </g>
</target>

TextFragment.codes of the source section:

0 = {Code@21675} ""
 tagType = {TextFragment$TagType@21682} "OPENING"
 outerData = {StringBuilder@21685} "<g ctype="x-html-p" id="1" dgo:tag_name="p">"
1 = {Code@21656} ""
 tagType = {TextFragment$TagType@21244} "PLACEHOLDER"
 outerData = {StringBuilder@21693} "<ph ctype="image" id="2" htm:src="B9BD5C75F6951B0.gif" htm:width="350" htm:height="350" htm:border="2px solid rgb(255, 0, 0)" htm:float="left" htm:margin="10px"><sub ctype="x-html-img-alt">"
2 = {Code@21676} ""
 tagType = {TextFragment$TagType@21244} "PLACEHOLDER"
 outerData = {StringBuilder@21700} "</sub></ph>"
3 = {Code@21677} ""
 tagType = {TextFragment$TagType@21721} "CLOSING"
 outerData = {StringBuilder@21724} "</g>"

Question:

  1. Should<ph></ph> be coded as "PLACEHOLDER" or "OPENING" and "CLOSING" tags?
  2. If it’s right to make <ph></ph> PLACEHOLDERs, is it possible to retain the OPENING and CLOSING information from TextFragment?

‌

Thank you very much!

Hao

‌

‌

Assignee Loading
Time tracking Loading