ITS filtering messes up EOL in target files
Preconditions
- A source XML file with UNIX line endings (CRLF) and ITS filter properties (some included text and some excluded text in the target language) – included in the example packages attached.
-
The ITS filter props in the source file provided include
<Option D>
for fr-ZZ and excludes<Option E>
andOption F
. So only the first<label>
node is in scope for translation for thefr-ZZ
locale:<file xmlns:html5="html5" xmlns:its="http://www.w3.org/2005/11/its" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <sharedstimuli></sharedstimuli> <items> <item> <label its:localeFilterList="fr-ZZ" its:localeFilterType="include"> <text><Option D></text> </label> <label its:localeFilterList="fr-ZZ" its:localeFilterType="exclude"> <text><Option E></text> </label> <label its:localeFilterList="fr-ZZ" its:localeFilterType="exclude"> <text>Option F</text> </label> </item> </items> </file>
-
- A translation kit with the source file above and
fr-ZZ
as the target language.
Tools
- Okapi Rainbow 1.45
- OmegaT 6.1.0 (latest version) with OmegaT filter plugin okapiFiltersForOmegaT-1.15-1.47.0.jar (latest as of now) or any other earlier versions
Steps to reproduce in Rainbow
- Unzip the
mixed_eol_okapi_project.zip
bundle attached - Open the manifest.rkm file in Rainbow (1.47)
- Run Utilities > Translation Kit Post-Processing > Execute to produce the target file
- Examine line endings in target file (i.e.
pack1/done/dummy-source-crlf.out.xml
)
Steps to reproduce in OmegaT
- Unpack¹ the OmegaT packed project attached, i.e.
mixed_eol_omegat_project.omt
- In OmegaT, create target files (e.g. you may press Ctrl+D)
- Examine line endings in the same target file (i.e.
target/dummy-source-crlf_fr-ZZ.xml
) - Optionally, compare with the other two files
Expected results
The line endings are homogeneous in the target file.
Actual results
The line endings will be messed up depending on whether the text node was filtered in or filtered out by the ITS filter properties (and depending which localization tool was used). The issue happens in the target file generated with the Okapi filter in OmegaT as well as when merging XLIFF at the post-processing step using Okapi Rainbow.
In Rainbow, all lines with have CRLF (DOS) ending in all lines except in ITS-included lines, where there will be CRCRLF (a mix of Mac and DOS line endings):
In OmegaT, it seems to be the opposite: all lines with have CRLF (DOS) ending in all lines except in ITS-excluded lines, where there will be CRCRLF (a mix of Mac and DOS line endings):

More info
This is the root cause of a blocking error in OmegaT team projects hosted in git repositories, which use the Okapi XML filter and contain files that have some nodes excluded in the target language of the project by ITS filter properties.
This is what happens if the original source files have LF line endings (Unix EOL):
-
When OmegaT downloads the project on Unix-like machines (e.g. Linux, Mac), the source files are fetched from the repo just as they are, without EOL alteration: line endings stay as LF.
-
When OmegaT downloads the project on Windows, the jgit library built-in in OmegaT uses DOS line endings, changing LF to CRLF.
-
When target files are compiled in OmegaT, the line endings get mixed/messed in the way shown by the screenshot above, and the target files are pushed to the repository with the EOL mix.
-
Right after the commit and push, OmegaT syncs again, which causes a checkout conflict error, which blocks the project:
The only way to get rid of this error is to download the project again or if the translator unhides and deletes the .repositories folder inside the project containing the git clone of the repository.
¹ OMT packages can be unpacked in OmegaT with this plugin https://github.com/briacp/plugin-omt-package/releases or just unzipped as if they were zip files.