Windows: modules/features using OpenCV/OpenSSL/GPerfTools to read/write files do not work with Unicode characters
Summary
Modules using OpenCV to read/write files will fail on Windows if the file path contains any Unicode characters. Same for file-related features in OpenSSL (config-server TLS authentication support) and Google-PerfTools (Heap and CPU profilers) and FFmpeg (video output).
Steps to reproduce
Following modules are affected:
- calibration (cv::FileStorage, cv::imwrite)
- undistort (cv::FileStorage)
- stereo_rectification (cv::FileStorage)
- image_output (cv::imwrite)
- video_output (FFmpeg avio_open)
Following functionality is affected:
- HeapProfiler (HeapProfilerStart)
- CPUProfiler (ProfilerStart)
- ConfigServer (OpenSSL calls for TLS auth)
What is the current bug behavior?
Error on file read/write on Windows if Unicode characters are present.
What is the expected correct behavior?
File read/write works okay with all characters.
Execution environment information
- OS: Windows
- OS version: 10
- DV version: master
- DV installation type:
-
package manager -
official download -
manual compile
-
Possible fixes (optional)
Some cases could be fixed by renaming the file to a temporary one that does not contain Unicode characters and then reading it, or writing to such a no-Unicode file and then renaming it. The boost::filesystem unique temporary file functionality would help, together with its rename/copy_file functionality. Careful of permissions, cross-filesystem copy!
Other places could maybe be loaded as normal files using normal boost::nowide::fstream API and then written into buffers, which are then passed to library functions that use buffers instead of files directly. OpenSSL and some OpenCV functions might support this being done.
There seems to also be settings in Windows 10 (November 2019 update and newer) to make a program always use UTF-8, which is what we do internally. This needs to be explored, also wrt compatibility with older Windows if enabled.
Links / references (optional)
- http://openssl.6102.n7.nabble.com/Windows-I18N-Unicode-characters-in-cert-filenames-td13786.html
- https://stackoverflow.com/questions/2401059/openssl-with-unicode-paths
- https://pdh11.blogspot.com/2009/05/unicode-is-one-true-god-and-utf-8-is.html
- https://answers.opencv.org/question/142347/does-cvvideocapture-support-both-utf-8-and-traditional-codepage-encoded-strings/
- https://www.boost.org/doc/libs/develop/libs/nowide/doc/html/index.html
- https://utf8everywhere.org/
- https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html
- https://stackoverflow.com/questions/4592261/windows-api-ansi-and-wide-character-strings-is-it-utf8-or-ascii-utf-16-or-u
- https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8
- https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page
Issue Planning (for developers use only, do not remove)
StartDate: 2020-04-01
DueDate: 2020-04-01