Elasticsearch shoud properly index PDF documents for search
Problem to solve
Currently PDF documents are rendered within the GitLab projects view for users. This allows users to navigate to a PDF document within their project and view the contents as they would normally if the file was local to them. However, PDF content is not properly rendered in search results, meaning that users are unable to find the content of the files they're searching for.
Intended users
Further details
Example PDF Search Results: https://gitlab.com/search?group_id=9970&repository_ref=&scope=blobs&search=extension%3Apdf+%2A&snippets=
One of those results in Project View: https://gitlab.com/gitlab-org/build/omnibus-mirror/exiftool/blob/bec11e1eb67734b8626d8dd040d6b460ef468bb4/html/exiftool_pod.pdf
Consideration of this should be done inside total consideration of how we deal with Index size across GitLab.
UX should also consider how we handle document types that don't typically have line numbers for display. Whether these need a separate section from code
or some other proposal.
Proposal
The GitLab Elasticsearch Indexer should be updated to properly index the content of PDFs. This would allow the results to be displayed within the search results page.
Permissions and Security
Documentation
Documentation should be updated to inform users of PDF as a supported document type for indexing within Elasticsearch.