OpenXML (docx without word/styles.xml)
Created by: Anonymous
Original issue 453 created by andriy.luts...@crowdin.com on 2015-04-07T12:20:37.000Z:
According to ECMA-376, 4th Edition
(and also https://msdn.microsoft.com/en-us/library/aa982683%28v=office.12%29.aspx)
style (word/styles.xml) definitions part is optional.
Usually this file exists in generated by LO/OO/MSO, but also there some apps which don't create this file.
In result of processing this (attached) docx file we obtain zip file only with [Content_Types].xml.
Solution of this issue:
File
okapi/okapi/filters/openxml/src/main/java/net/sf/okapi/filters/openxml/OpenXMLFilter.java in method nextInZipFile() in condition if(nZipType==MSWORD):
line 694:
if (sEntryName.equals("word/styles.xml"))
change to:
if (sEntryName.equals("word/document.xml"))
and line 704-705:
(sEntryName.equals("[Content_Types].xml") || // but don't do Content_Types
sEntryName.equals("word/styles.xml"))) // and styles a second time
change to:
(sEntryName.equals("[Content_Types].xml") || sEntryName.equals("word/document.xml")))
I can't estimate quality of my "hack" but after this fixing processing of attached document finished successfully.
Tested on okapi's development branch.