OpenXML Filter: always expose one of runs with implicit formatting

If we consider an example when there are runs with different formatting, neither of which can be refactored with the paragraph formatting.

‌

Below is a corresponding document structure:

<w:p>
    <w:pPr>
    <w:rPr>
        <w:sz w:val="28"/>
    </w:rPr>
    </w:pPr>
    <w:r>
        <w:rPr>
            <w:sz w:val="26"/>
        </w:rPr>
        <w:t xml:space="preserve">Run 13pt.</w:t>
    </w:r>
    <w:r>
        <w:rPr>
            <w:sz w:val="24"/>
        </w:rPr>
        <w:t xml:space="preserve">Run 12pt.</w:t>
    </w:r>
    <w:r>
        <w:rPr>
            <w:sz w:val="28"/>
        </w:rPr>
        <w:t xml:space="preserve">Run 14pt.</w:t>
    </w:r>
</w:p>

And the current extraction is this:

<g id="1">Run 13pt.</g><g id="2">Run 12pt.</g><g id="3">Run 14pt.</g>

This is a segment that contains positions that are outside of any formatting - e.i. before the first , or between and , or after the final . Thus, it is not very easy to work with such segments for translators.

A better segment would be one in which one of the runs is assumed, and the other is highlighted. For instance:

Run 13pt.<g id="1">Run 12pt.</g><g id="2">Run 14pt.</g>

In this case, the styling of every position in the segment is known. Either the position is inside a tag pair, in which case it has the styling of that run, or it is outside a tag pair, in which case it has the styling of the "implicit" tag pair.

The example document is attached.

Assignee Loading
Time tracking Loading