WIP: Fix encoding issue for slashed-o character
WIP: I can narrow down the problem but I'm not actually sure what is the correct solution.
It seems that GuessCharset
is meant to be heuristic. So perhaps that means we're ok to just live with some error rate from this? Seems like this particular case of just using a single Danish character should not be that uncommon probably but still it fails to be detected correctly.
It seems that this string should be detected as UTF-8
but it's not and the detected character set doesn't even include the ø
character being encoded.
Related to gitlab#33090 (closed)
See analysis in gitlab#33090 (comment 224800784)
Edited by 🤖 GitLab Bot 🤖