Python exception when UTF-8 characters are encountered on non-unicode system.
On my system, which I've generally built to avoid international character sets, appi 0.1.6 defaults to using the ascii encoding when reading files. This causes appi/python to throw a stack trace since the ascii encoder cannot parse non-ascii characters (code points > 0x7e).
An example of this occurs for some users in Funtoo linux's new "ego query version" command, as discussed here: https://forums.funtoo.org/topic/1362-ego-trouble/
As mentioned there, the solution is to somehow either ignore extended characters when reading files, or to specify a specific encoding when reading files which is able to handle those characters. The solution I came to for my own use was to simply replace invalid characters, since that was the simplest way to allow the command to work as desired, but I'm not convinced that is the best universal solution for everyone who uses the library. I listed a few alternative options I thought of on that page, and will include them here as well for completeness:
- use errors='ignore' on open, but I think this drops the entire input data when it dislikes even 1 character in the data
- use errors='replace' on open, which replaces unwanted characters with an encoding-specific character
- specify an encoding on open that does not throw an exception on non-ascii characters
- leave the open alone and catch the exception somewhere else
- possibly other options?
I would be happy to submit my patch if it is determined that is the best solution, but I thought it would be best to discuss it openly with the community before making a commit or pull request.
My personal patch was to appi/conf/profile.py as follows:
--- profile.py 2018-02-17 06:12:06.000000000 -0600
+++ profile.py.modified 2018-02-24 15:52:43.000000000 -0600
@@ -112,7 +112,7 @@
k: v for k, v in context.items()
if k in constant.INCREMENTAL_PORTAGE_VARS
}
- with open(str(path), 'r') as f:
+ with open(str(path), 'r', errors='replace') as f:
output_vars = set(re.findall(
r'^\s*(?:export\s+)?([a-z][a-z0-9_]*)=', f.read(), re.M | re.I
))