Blobs and diffs should display non-utf8 data correctly in the browser
Whether external MR diffs are enabled or not, the diff for a merge request is not displayed correctly when the underlying files being diffed do not contain UTF-8-compatible data.
For SHIFT-JIS, we end up with a mojibake display on GitLab.com today (diffs stored in-database). If we enabled external MR diffs, then we might see an exception instead.
https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/32862 will "fix" the external diff case so we also see mojibake, rather than a 500, but that's still not amazing.
Steps to reproduce
Create an MR that diffs a SHIFT-JIS or cyrillic-encoded file. Anything non-utf-8-compatible.
What is the current bug behavior?
Mojibake. Note that the file contains seven bytes, but only 3 characters are shown - plain ASCII.
What is the expected correct behavior?
We should render the 3 hiragana characters, regardless of how the diff is stored / cached / highlighted.
Relevant logs and/or screenshots
Output of checks
This bug happens on GitLab.com
I expect we need to fix this y transcoding the non-utf8-encoded parts into the utf-8 representation of the characters in the source encoding. I don't think we can reasonably pass the source encoding through to the browser.
Whatever we do, when downloading the raw file, diff, patch, commit, etc, we should always retain the original encoding. I think this is the case at present.