LC_CTYPE broken
By egm... on November 09, 2010 23:05 (imported from Google Code)
Followup of issue #281 (closed):
Beginning with r313, LC_CTYPE is set to the charset only rather than a complete locale specification. This does not work, for example in midnight commander 4.7 accented characters don't show up properly.
What midnight commander does, and many other application do, is that after setting up the locale with setlocale() it calls nl_langinfo() to get the name of the character set. Here's a short test code, it should print the name of the current charset (e.g. "UTF-8"), but in iTerm r313 it prints the empty string instead.
#include <locale.h>
#include <langinfo.h>
#include <stdio.h>
int main () {
setlocale(LC_ALL, "");
char *codeset = nl_langinfo(CODESET);
printf("%s\n", codeset);
return 0;
}
On my Mac 10.5.8 I see the following environment variables:
Terminal.app:
LANG=hu_HU.UTF-8
no LC_*
iTerm r312:
LANG=hu_HU
LC_COLLATE=hu_HU.UTF-8
LC_CTYPE=hu_HU.UTF-8
LC_MESSAGES=hu_HU.UTF-8
LC_MONETARY=hu_HU.UTF-8
LC_NUMERIC=hu_HU.UTF-8
LC_TIME=hu_HU.UTF-8
iTerm r313:
LANG=hu_HU
LC_CTYPE=UTF-8
The meaning of these variables, for any "well-behaved" application (using the setlocale() function) is:
LANG is the fallback default
LC_<individual> overrides LANG for that particular category
LC_ALL overrides everythings
Not so well-behaved applications might manually check the value of some of these env variables and do whatever weird thing they want to.
Setting LC_CTYPE to UTF-8 is definitely broken, it should contain a full locale.
I can't see the point in the old behavior of setting LC_CTYPE=hu_HU.UTF-8 and LANG=hu_HU either. I can imagine situations where the lack of charset in LANG causes problems (I'm not sure about it though), and if someone happens to unset LC_CTYPE then it will definitely be broken. I think Terminal.app's approach is much cleaner and is the correct one (LANG=hu_HU.UTF-8 and nothing else). After all, there's not much point in having override possibility if the terminal uses it right away. The terminal should maybe only set the variable with the lowest priority (LANG in this case) so that it leaves room for users to override.
I'm not sure about the general concept of setting LANG and friends in Macintosh, but on Linux systems I would definitely say that modifying any of these by the terminal itself is a broken idea. The language should preferably be set globally for the whole desktop environment, or in a not so ideal case it could be set inside the application run by the terminal (typically in .bashrc or .bash_profile), but it's really not the terminal's job. The terminal could use these values in order to set its own UI language, but should pass them on unchanged to the application. It would be nice to verify if Terminal.app does the same, and if it does, then iTerm should do so, too.