Some PDFs shown as text rather than binary files
Zendesk: https://gitlab.zendesk.com/agent/tickets/49121
A customer has an issue where some PDFs are correctly recognized as binaries, but others attempt to load as text. There doesn't seem to be a noticeable difference in the files, the way they were added, etc. Both good and bad PDFs exist within the same repository. They have also added a .gitattributes
file with *.pdf binary
and it does not help.
I found https://github.com/github/linguist/issues/1873#issuecomment-67377129 that suggests there are half a dozen different things that might identify a binary file. In gitlab_git
we only check blob.binary?
which apparently is false
in this case. (Theory, haven't tested).
- How can we gather more information about this?
- How can we improve blob binary detection so PDFs don't attempt to render.
This is a big problem for this customer because they have some less technical people using GitLab and they need to download the PDFs. When one of these PDFs is opened in the file viewer is pretty much crashes Chrome. They also don't want to ask these less technical people to go install Source Tree just to download the PDF.
cc/ @yorickpeterse @smcgivern This is the bug we discussed in chat earlier.