Blobs and diffs should display non-utf8 data correctly in the browser
Summary
Spotted in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/32862#note_216167183
Whether external MR diffs are enabled or not, the diff for a merge request is not displayed correctly when the underlying files being diffed do not contain UTF-8-compatible data.
For SHIFT-JIS, we end up with a mojibake display on GitLab.com today (diffs stored in-database). If we enabled external MR diffs, then we might see an exception instead.
https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/32862 will "fix" the external diff case so we also see mojibake, rather than a 500, but that's still not amazing.
Steps to reproduce
Create an MR that diffs a SHIFT-JIS or cyrillic-encoded file. Anything non-utf-8-compatible.
Example Project
hiroponz/non-utf8-encoding-test!1 (diffs)
What is the current bug behavior?
Mojibake. Note that the file contains seven bytes, but only 3 characters are shown - plain ASCII.
What is the expected correct behavior?
We should render the 3 hiragana characters, regardless of how the diff is stored / cached / highlighted.
Relevant logs and/or screenshots
Output of checks
This bug happens on GitLab.com
Possible fixes
I expect we need to fix this y transcoding the non-utf8-encoded parts into the utf-8 representation of the characters in the source encoding. I don't think we can reasonably pass the source encoding through to the browser.
Whatever we do, when downloading the raw file, diff, patch, commit, etc, we should always retain the original encoding. I think this is the case at present.