Codes simplification does not comply with the merge
Please consider the following XML markup:
<?xml version="1.0" encoding="utf-8"?>
<document>
<block>
In block
<link>
<p.text>In link
<y.enum/>
</p.text>
</link>
</block>
</document>
With the following ITS filter configuration:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<its:rules xmlns:its="http://www.w3.org/2005/11/its" xmlns:itsx="http://www.w3.org/2008/12/its-extensions" xmlns:okp="
okapi-framework:xmlfilter-options" xmlns:xlink="http://www.w3.org/1999/xlink" its:translate="no" version="1.0">
<!-- inline tags, that may contain content -->
<its:translateRule selector="//p.text" translate="yes"/>
<its:withinTextRule selector="//p.text" withinText="yes"/>
<!-- Inline tags, that may contain content, but the content is locked for editing -->
<its:translateRule selector="//link" translate="no"/>
<its:withinTextRule selector="//link" withinText="yes"/>
<okp:options extractUntranslatable="yes"/>
</its:rules>
And the standard codes simplification applied, the following extraction can be observed (XLIFF):
<trans-unit id="1">
<source xml:lang="en-US"> In block <it id="1" ctype="link" pos="open"><link>
<p.text></it>In link </source>
<target xml:lang="fr-FR"> In block <it id="1" ctype="link" pos="open"><link>
<p.text></it>In link </target>
</trans-unit>
<trans-unit id="2">
<source xml:lang="en-US"> <it id="2" ctype="x-p.text" pos="close"></p.text>
</link></it> </source>
<target xml:lang="fr-FR"> <it id="2" ctype="x-p.text" pos="close"></p.text>
</link></it> </target>
</trans-unit>
And it is merged as:
<?xml version="1.0" encoding="UTF-8"?>
<document>
<block>In block <link>In link<y.enum/>
</block>
</document>
Please note the absence of the closing </link> tag and both <p.text> and </p.text> tags.
The merging step copies code metadata from the source, which may contain a larger number of codes, as they have not been simplified. At that time, the outer data of the code is copied to the target, considering only matching ones from the source. Therefore, consequential codes that were merged during their simplification are receiving the outer data of the first code in the source only. The rest of the code’s outer data is lost.
If I am following the concept right, I believe any code-related changes should be made on the filter level only. So, either all code simplifications as steps have to be moved to the filters (this looks more agreeable even if there are up to 50 filters now), or the merging step has to be aware of such simplifications and perform them right after the filtering (this might be tricky). And any of the aforementioned ways is a fundamental change and may require significant efforts.
I think the first variant may be broken into pieces by making the code simplification step aware of whether there is a new way for handling this (like mergeAdjacentCodes parameter availability)… So, shipping of the solution can also be made gradual.
What is more, a not long ago enhancement for the IDML filter (#1415 (closed)) can be considered as fact for moving forward with the first variant...
Please be aware that there is a workaround - excluding specific codes from merging. Here are the filter level codes simplification rules that should help out with the aforementioned example (just add them inside the <its:rules> </its:rules>:
<okp:simplifierRules>
if DATA ~ "(<|&lt;)/?(p\\.text|link)(.*?)(>|&gt;)";
</okp:simplifierRules>
The merged document should look like:
<?xml version="1.0" encoding="UTF-8"?>
<document>
<block>In block <link>
<p.text>In link<y.enum/></p.text>
</link></block>
</document>
Attachments: