Autodetect blocks of code

Created by: alexweissman

My project (and I suspect many others as well) tends to attract a lot of coders who are newbies to collaborative projects. As a result, they are not used to using Markdown. So, we get code dumps, sometimes very large ones, that are very difficult to read because the user has not formatted them properly with code fences (and likely doesn't even know how, or that they should in the first place).

It would appear that Gitter already distinguishes among different languages when the code is already inside a code fence (I'm guessing probably using highlight.js). Would it be possible to distinguish between code and natural language in general, and then automatically fence out messages that are likely to be code blocks?

It looks like highlight.js actually tries to parse the code in various languages, ranking different language features by relevance and then aggregating those scores somehow. I'm not sure if distinguishing natural language would be as straightforward, or if it would require some kind of statistical classifier. Obviously it wouldn't work 100% of the time, but it would go a long way to improving user experience!

Edited Sep 02, 2020 by 🤖 GitLab Bot 🤖