Skip to content

Add Encoding Options to Syntaxes and filetypes.conf

In order to correctly parse and convert source files of languages which use a different encoding from UTF-8 it would be useful to introduce a new encoding attribute in langDefs. Eg:

Description = "Hugo"
Categories = {"source", "interactive fiction"}
Encoding = "ISO-8859-1"

Right now we have at least two languages (Alan and Hugo) from the pre-Unicode era which strictly require single byte encoding for their sources (ISO-8859-1 being the norm, although others are possible to).

By declaring the encoding directly in the syntax definition we'd guarantee correct batch conversion of source files which might include a mixture of UTF-8 and ISO encoded sources from various languages.

Currently, one need to use the --encoding option to handle them properly.

It would also be useful to allow defining syntaxes encoding in the filetypes.conf, for this could be used to override the syntax defaults on a per-project basis. For example, Hugo and Alan are interactive fiction languages, so their actual ISO encoding might vary depending on the spoken language of the text adventure (e.g. ISO-8859-2/3/4) or the legacy machine (or emulator) on which the adventure is being created (eg. "mac" or "dos").

While ISO-8859-1 is going to be the most common encoding choice nowadays, it's mostly useful just to enforce single-byte encoding and prevent the source from being treated as UTF-8. But users might wish to override this in their own project via a custom filetypes.conf.

Encodings other than UTF-8 are mostly going to be needed with legacy languages, but since retro-machines emulation is a growing trend, chances are that this is going to be needed more in the future. Just think of all the emulators fro C-64, Spectrum, Amiga, etc. There are whole communities thriving around emulation of 8-bit computers, with people actively programming games and software on them.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information