XLIFFFilter: `<ph></ph>` are coded as two TagType.PLACEHOLDER with TextFragment.MARKER_ISOLATED
Created by: hao
Hi Okapi,
In short:
<ph></ph> is coded to two TagType.PLACEHOLDER with TextFragment.MARKER_ISOLATED, and it’s not possible to retain the OPEN/CLOSE pair tags from it when processing TextFragment.codes from Events.
Backgrounds:
we’re using Okapi to parse XLIFF, extract text and InlineElements(MARKER_OPENING/ MARKER_CLOSING Elements) and Markers(MARKER_ISOLATED), send them to our Machine-Translation Service, get the results back, merge back translation with InlineElements and Markers to Okapi.
Recently we find out an issue that xliff tag <ph></ph> are coded as two TagType.PLACEHOLDER but not a OPENING and CLOSING. We understand that <ph> stands for PLACEHOLDER, but it this case (), should it be an OPEN and CLOSE element?
Details:
Xliff:
<source>
<g ctype="x-html-p" id="1" dgo:tag_name="p">
<ph ctype="image" id="2" htm:src="B9BD5C75F6951B0.gif" htm:width="350" htm:height="350" htm:border="2px solid rgb(255, 0, 0)" htm:float="left" htm:margin="10px">
<sub ctype="x-html-img-alt">display</sub>
</ph>
</g>
</source>
<target xml:lang="de-DE">
<ph ctype="image" id="2" htm:src="B9BD5C75F6951B0.gif" htm:width="350" htm:height="350" htm:border="2px solid rgb(255, 0, 0)" htm:float="left" htm:margin="10px">
<sub ctype="x-html-img-alt">
<g ctype="x-html-p" id="1" dgo:tag_name="p">Anzeige</sub>
</ph>
</g>
</target>
TextFragment.codes of the source section:
0 = {Code@21675} ""
tagType = {TextFragment$TagType@21682} "OPENING"
outerData = {StringBuilder@21685} "<g ctype="x-html-p" id="1" dgo:tag_name="p">"
1 = {Code@21656} ""
tagType = {TextFragment$TagType@21244} "PLACEHOLDER"
outerData = {StringBuilder@21693} "<ph ctype="image" id="2" htm:src="B9BD5C75F6951B0.gif" htm:width="350" htm:height="350" htm:border="2px solid rgb(255, 0, 0)" htm:float="left" htm:margin="10px"><sub ctype="x-html-img-alt">"
2 = {Code@21676} ""
tagType = {TextFragment$TagType@21244} "PLACEHOLDER"
outerData = {StringBuilder@21700} "</sub></ph>"
3 = {Code@21677} ""
tagType = {TextFragment$TagType@21721} "CLOSING"
outerData = {StringBuilder@21724} "</g>"
Question:
- Should
<ph></ph>be coded as "PLACEHOLDER" or "OPENING" and "CLOSING" tags? - If it’s right to make
<ph></ph>PLACEHOLDERs, is it possible to retain the OPENING and CLOSING information from TextFragment?
Thank you very much!
Hao