Changes

Only a few of the files have been cleaned up so far. I'll continue cleaning, but want to push this up to the server.
Kevin J. McCarthy · 642dcec4
--- a/MuttFaq/Charset.md
+++ b/MuttFaq/Charset.md
+### Umlauts, accents, and other non-ASCII characters are displayed as '?' or '\\123' -- locales
+
+**Short answer:** set \`LC\_CTYPE=en\_US.ISO-8859-1\`.
+
+**Long answer:** You have to configure your *locale* settings. This is
+done by setting environment variables.
+
+If your system is already configured, you only have to set \`$LANG\`
+and/or some of the \`$LC\_\*\` variables. \`$LANG\` is the default for
+the \`$LC\_\*\` categories that are unset, and \`$LC\_ALL\` overrides
+all other variables, so make sure the latter one is unset. Mutt cares
+mostly about these
+categories:
+
+``* `LC_CTYPE` is the character set used by your terminal``  
+``* `LC_MESSAGES` is the language used by the Mutt menus and messages printed``  
+``* `LC_TIME` is used by ``*`strftime(3)`*
+
+We will use \`$LANG\` here. Examples for settings
+are:
+
+``* `export LANG=de_DE.UTF-8` (sh/bash syntax, put that into your .bashrc/.bash_profile)``  
+``* `setenv LANG en_US.ISO-8859-1` (csh/tcsh syntax for .cshrc/.login)``
+
+The triple stands for *language*\_*country*.*charset*. There are also
+variants like *de\_AT@euro*, and aliases like *deutsch*. Check the
+output of "\`locale -a\`" to see what locales values are supported by
+your system. Type "\`locale\`" to check the actual settings of all
+categories.
+
+    $ locale
+    LANG=de_DE.UTF-8
+    LC_CTYPE="de_DE.UTF-8"
+    ...
+
+Finally, verify Mutt correctly detects the charset of the locale.
+Restart Mutt and type:
+
+    :set &charset ?charset
+    charset="utf-8"
+
+Don't forget to empty the Mutt [header
+cache](http://www.mutt.org/doc/devel/manual.html#header-cache) when you
+change the charset if you're not at least running mutt 1.5.18.
+
+Also, if you built mutt yourself it is critical that you use a unicode
+aware ncurses. Sometimes the package for that is ncursesw\*. If there is
+no such package chances are your system includes unicode for all of
+ncurses.
+
+**Problem:** If "\`locale --all-locales\`" list is empty, or lacks a
+suitable value, you have to generate the locale files first. Check the
+*localedef(1)* manpage on how to do this. Debian users simply call
+"\`dpkg-reconfigure locales\`" (make sure the *locales* package is
+installed).
+
+**Further
+problems:**
+
+``* Some systems (MacOS X before 10.4, NetBSD...) have no `locale` command installed.``  
+``   Use something as "`ls /usr/share/locale/`" or "`ls /usr/lib/locale/`" to list available values.``  
+`* Some systems (libc5...) have no way to tell Mutt the locale's charset.`  
+`   You have to set $charset variable in muttrc yourself.`  
+`* Some systems (HP-UX, AIX, OSF1, Irix...) have not totally standard names for some charsets.`  
+`   Use iconv-hooks to alias them to standard names. Example files come with Mutt tarball.`  
+`* Some systems (Cygwin...) have no working locales.`  
+``   Use the `--enable-locales-fix` configure option, and set $charset yourself,``  
+`   but be prepared to have some limitations in functionality.`
+
+### Umlauts, accents, and other non-ASCII characters are displayed fine in some mails, but hidden in others
+
+Make sure the mails have a proper charset declaration in the header. For
+example:
+
+    Content-Type: text/plain; charset=iso-8859-15
+    Content-Transfer-Encoding: 8bit
+
+In case the charset label lacks, lies, or these headers lack entirely,
+you can still try to make Mutt workaround the problem on-the-fly.
+Example for westerners receiving broken mails really mostly in Latin-1
+or CP-1252 charset: Declaring CP-1252 as default assumed charset for
+broken mails.
+
+    charset-hook ^us-ascii$ cp1252
+    charset-hook ^iso-8859-1$ cp1252
+    unset strict_mime
+    set assumed_charset="cp1252"
+
+or
+
+    charset-hook US-ASCII     ISO-8859-1
+    charset-hook x-unknown    ISO-8859-1
+    charset-hook windows-1250 CP1250
+    charset-hook windows-1251 CP1251
+    charset-hook windows-1252 CP1252
+    charset-hook windows-1253 CP1253
+    charset-hook windows-1254 CP1254
+    charset-hook windows-1255 CP1255
+    charset-hook windows-1256 CP1256
+    charset-hook windows-1257 CP1257
+    charset-hook windows-1258 CP1258
+
+Another example, for Chinese receiving broken mails really mostly in
+GB2312 charset:
+
+    charset-hook ^us-ascii$ gb2312
+    unset strict_mime
+    set assumed_charset="gb2312"
+
+In more specific cases you can use <edit-type> function to manually
+override a wrong label. By default it's \!^E key. From index or pager it
+acts on the body of the mail, while from attachments menu it acts for
+the individual part selected.
+
+See also: [PatchList](PatchList):
+assumed\_charset
+
+### Umlauts, accents, and other non-ASCII characters are only displayed wrong when using auto\_view
+
+First, imagine a situation when you have to use [MIME
+Autoview](http://www.mutt.org/doc/devel/manual.html#auto-view) i.e. to
+display \`text/html\` content in the mutt-pager.
+
+You get a mail with the following header:
+
+    Content-Type: text/html; charset="iso-8859-1"
+
+your locales are:
+
+    $ locale
+    LANG=en_US.UTF-8
+    LC_CTYPE="en_US.UTF-8"
+    (...)
+
+your mailcap looks something like:
+
+    text/html w3m -dump %s; copiousoutput
+
+and there is one \`auto\_view\` in your muttrc:
+
+    auto_view text/html
+
+When you open this mail in the mutt-pager, mutt spawns *w3m* (or any
+other text-browser defined in mailcap), *w3m* dumps text generated from
+the input html-file (\`%s\`) back to mutt and the mutt-pager displays it
+-- unfortunately wrong.
+
+The problem is, that *w3m* does not know anything about the character
+encoding of the input-file. *w3m* can only figure out a (possible)
+charset from your locales but in our example the sets don't match
+(\`iso-8859-1 \!= UTF-8\`).
+
+One can get around this, with the \`%{charset}\` variable in
+\`mailcap\`:
+
+    w3m -I %{charset} -T text/html -dump; copiousoutput
+
+(Don't be confused by the missing \`%s\` -- *w3m* can read data from
+stdin so \`%s\` is basically not needed. See [Advanced mailcap
+Usage](http://www.mutt.org/doc/devel/manual.html#id929354) for details.)
+
+the *w3m* documentation says:
+
+    $ w3m -h
+       (...)
+       -I charset       document charset
+       -T type          specify content-type
+       (...)
+
+With this entry, your mutt-pager will print something like:
+
+    [-- Autoview using w3m -I 'iso-8859-1' -T text/html -dump --]
+      (...)
+      Here are some Umlauts: äöü ÄÖÜ
+      (...)
+
+As you can see, mutt resolved \`%{charset}\` correctly into
+\`iso-8859-1\`. Of course the input-charset options above depend on your
+preferred
+text-browser.
+
+### Characters are replaced by ? when charsets and fonts are correctly set up
+
+The problem here is that characters in the document's charset are simply
+not available in mutt's current charset. This is particularly prevelant
+in documents created by Microsoft agents. Mutt can be instructed to make
+a best effort attempt to replace the missing characters with something
+similar by appending //TRANSLIT to the set charset declaration (e.g. set
+charset=iso-8859-1//TRANSLIT).
+
+**Note:** Whatever nice this "approximation" trick can be, it's only a
+workaround. The best solution for the problem is upgrading to a more
+capable terminal, with a charset able to display directly all wanted
+characters. But it's not always possible or easy.
+
+### How can I check if locales work before I blame Mutt for it?
+
+perl is sensetive to proper locale settings. On certain distros (e.g:
+Debian) it will complain when the charset settings are incorrect. Try:
+
+    perl -e ""
+
+should do nothing and print nothing. If it gives a loud ugly warning
+about LANG, LC\_CTYPE and LC\_ALL, something's wrong. But if it does not
+shout it may only be because it is configured not to (how?). To test for
+that, run:
+
+    env LC_ALL=nocharset perl -e ""
+
+and verify that you <em>do</em> get and ugly warning with it.
+
+GNU *ls* also uses $LC\_CTYPE. Simply "\`touch äöü\`" a file with
+non-ASCII characters and look whether "\`ls\`" lists the proper name, or
+just "???". To test $LC\_MESSAGES, call GNU *grep*:
+
+    Aufruf: grep [OPTION]... MUSTER [DATEI]...
+    grep --help gibt Ihnen mehr Informationen.
+
+(Obviously, this method does not work for English locales.)
+
+### UTF-8 chars are displayed fine, but the screen is garbled
+
+Mutt has to be linked against a term library with wide char support. For
+ncurses, this is the libncurses**w** library.
+
+    $ mutt -v | grep using
+    System: Linux 2.4.25-planck (i686) [using '''ncurses''' 5.4]
+    $ ldd `which mutt` | grep curses
+    libncursesw.so.5 => /usr/lib/libncurses'''w'''.so.5 (0x40023000)
+
+To get libncursesw, compile curses with --enable-widec. Debian users
+install the libncursesw5 package. (On Debian/Woody (stable), install
+mutt-utf8. Starting with Debian/Sarge, Mutt is already linked against
+libncursesw; try apt-get build-dep mutt if you compile your own mutt.)
+
+Default Slang seems not to work with UTF-8, relink Mutt against
+libncursesw. (Hello Gentoo users :-)
+
+S-Lang needs the UTF-8 patch to work with UTF-8. Here it is:
+<http://www.emaillab.org/mutt/tools/slang-1.4.8-utf8.diff.gz> (This
+displays CJK chars more correctly than ncursesw.)
+
+### I tuned all the variables correctly, but my messages are garbled
+
+Miscoded characters can perturbate the charset transcoding, or their
+auto-sensing by your $editor. Make sure that your signature, aliases,
+muttrc, /etc/Muttrc, and any files sourced are written with the right
+charset. Make sure that the charset of **$locale** (used to localize
+date and time) matches your **$charset**. Make sure that the mail you
+quote was cleanly displayed before.
+
+  -   
+    **Tip:** Autoconvert on-the-fly the config files from their fixed
+    charset to the current $charset:
+
+Convert once for all your files to one given charset, your preferred
+one. Example here UTF-8. From now on edit them only in this charset.
+Then add at the **beginning** of your muttrc:
+
+    set config_charset=utf-8
+    set signature="iconv -f utf-8 ~/.signature |"
+    set locale=`echo "${LC_ALL:-${LC_TIME:-${LANG</code>"`
+
+  -   
+    **Note:** $config\_charset feature is included since Mutt 1.5.7.
+
+The **$editor** used by Mutt to compose messages must be configured to
+read and write files in current locale's charset, without smart
+autosensing of file's charset. When used for the \<**edit**\> function
+(edit the raw message), autosensing can help. When used to edit muttrc,
+signature, or aliases, hardcode the charset previously choosen as
+**$config\_charset**.
+
+Regarding your editor of choice: Some distros change the defaults of the
+editor you use or the defaults are not good enough. For example some
+distributions set the **fileencoding of Vim to UTF-8** no matter what
+locale the user chooses to use. Say the user chooses LANG="de\_DE@euro".
+Then displaying received messages containing umlauts or other special
+characters is most likely no problem at all. But writing messages
+results in a total mess. For instance sending a string containing
+"öäüß@€" results in "Ã¶Ã€ÃŒÃ\\237@â\\202¬". You can fix this by
+setting up your own ~/.vimrc holding the following:
+
+    set encoding&       " terminal charset: follows current locale
+    set termencoding=
+    set fileencodings=  " charset auto-sensing: disabled
+    set fileencoding&   " auto-sensed charset of current buffer
+
+Those settings are in fact reset to Vim's sensible defaults. Only the
+**fileencodings** is different: Its default value is very nice, but can
+sometimes hurt Mutt. At best, it should be unset **only** when called
+from Mutt to compose a message, not in general (how?).
+
+### Attached text files get sent misencoded with wrong charset
+
+By default Mutt assumes the text files you attach are originally in the
+same charset as your terminal. Upon sending, Mutt will convert those
+files from **$charset** to one of **$send\_charset**. This fails badly
+for any file that was **not** originally in **$charset**.
+
+There are 2 solutions:
+
+  - Interactively change the attachment's charset to the file's real
+    charset in compose
+menu\\
+
+`   before sending, using <`**`edit-type`**`> function (bound to ^T key by default) and replying `**`no`**` to the "Convert?" question.\`  
+`   This unfortunately bypasses automatic selection of the better suited sending charset.`
+
+  - Activate original charset auto-sensing
+with:
+
+<!-- end list -->
+
+    set file_charset="utf-8:iso-8859-1"
+
+`       Mutt then checks each `**`$file_charset`**` in turn.\`  
+`   The first charset in which the text file is entirely valid is assumed to be the file's charset.\`  
+`   Upon sending, Mutt will convert this file from auto-sensed charset to one of `**`$send_charset`**`.`
+
+**Note**: This "auto-sensing" is really educated guessing, and can fail.
+Keep an eye on compose menu, which displays for each attachment the
+charset choosen for sending (after **$charset** or **$file\_charset** to
+**$send\_charset** conversion). Particularly it is not able to
+distinguish similar 8 bits charsets like Latin-1 from Latin-2, or from
+CP-850, and such. UTF-8 and **one** 8 bits charset is OK. No more.
+Japanese may use "iso-2022-jp:euc-jp:shift\_jis:utf-8" which works well
+because those charsets are coded very differently and thus are easely
+distinguishable.
+
+**Note**: **$file\_charset** is one of the numerous features provided by
+Takashi Takizawa in his Japanese patch. It is also part of the compat
+patch, and of the tt.assumed\_charset patch. See more infos on
+[PatchList](PatchList). The feature is integrated in Debian and Gentoo Mutt packages.
+
+\=== MIME attachment filenames are displayed as =?iso-8859-1?Q === The
+filename is encoded in the deprecated RFC 2047 (which has been
+superseded by RFC 2231); this is commonly produced by Microsoft Outlook,
+and some other MUAs.
+
+Decode these filenames by setting this parameter:
+
+    set rfc2047_parameters