Skip to content

WIP: Fix encoding issue for slashed-o character

Dylan Griffith requested to merge 33090-encoding-issue-with-slashed-o into master

WIP: I can narrow down the problem but I'm not actually sure what is the correct solution.

It seems that GuessCharset is meant to be heuristic. So perhaps that means we're ok to just live with some error rate from this? Seems like this particular case of just using a single Danish character should not be that uncommon probably but still it fails to be detected correctly.

It seems that this string should be detected as UTF-8 but it's not and the detected character set doesn't even include the ø character being encoded.

Related to gitlab#33090 (closed)

See analysis in gitlab#33090 (comment 224800784)

Edited by 🤖 GitLab Bot 🤖

Merge request reports