umlauts in picture filename break relative-path images in AsciiDoc rendering

Summary

In the AsciiDoc rendering on the GitLab web interface, images with relative paths and with certain (or any?) non-ASCII-characters in their filename aren't being displayed.

Steps to reproduce

  1. begin in a new empty directory:
    cd $(mktemp -d)
  2. Download https://d33wubrfki0l68.cloudfront.net/dbfc383d23401ccbed7262a1822dba9babecb949/69a10/images/sunset.jpg and rename it to sunsët.jpg:
    wget https://d33wubrfki0l68.cloudfront.net/dbfc383d23401ccbed7262a1822dba9babecb949/69a10/images/sunset.jpg
    mv sunset.jpg sunsët.jpg
    (Should also work with any other image file and with file names with other non-ASCII characters.)
  3. Create an AsciiDoc file README.adoc alongside the image file, that references the image file as a picture by a relative path:
    cat <<eof > README.adoc
    = Example document
    
    image::sunsët.jpg[placeholder text]
    eof
    (Should also work with other file placements and names, as long as the relative path is correct.)
  4. (optional) Verify that this works for asciidoctor:
    asciidoctor-pdf README.adoc
    evince README.pdf &  # make sure the picture is shown in the PDF
  5. Create a new repo and add and commit the image file and the AsciiDoc file:
    git init
    git add sunsët.jpg README.adoc
    git commit -m'bug reproduction'
  6. Push to a new GitLab project

Example Project

  • Created with above reproduction instructions: das-g/non-ascii-image-name-in-asciidoc>
  • More thorough examples: das-g/asciidoctor-relative-image-path-vs.-gitlab>

What is the current bug behavior?

Preview of the AsciiDoc document on GitLab displays the placeholder text "placeholder text" instead of the picture.

Interesting observations:

  • The link around the missing picture still leads to the image file. I.e., if you click on the placeholder text, the browser will display the image (only the picture, without the AsciiDoc document or GitLab UI).
  • The src URL is just
    sunsët.jpg
    which the browser resolves to https://gitlab.com/das-g/suns%C3%ABt.jpg (on the project's file overview) or https://gitlab.com/das-g/non-ascii-image-name-in-asciidoc/-/blob/master/suns%C3%ABt.jpg (when viewing the README.adoc on master) while it should probably be
    /das-g/non-ascii-image-name-in-asciidoc/-/raw/master/suns%C3%ABt.jpg
    which the browser would resolve to https://gitlab.com/das-g/non-ascii-image-name-in-asciidoc/-/raw/master/suns%C3%ABt.jpg

What is the expected correct behavior?

Preview of the AsciiDoc document on GitLab displays the picture.

Relevant logs and/or screenshots

  • README.pdf from optional step 4 of the Steps to reproduce above
  • Placeholder text shown instead of picture: screenshot of browser with AsciiDoc document preview, but with placeholder text instead of picture shown

Output of checks

This bug happens on GitLab.com

Possible fixes

If you can, link to the line of code that might be responsible for the problem.

I don't know (yet) what specific line of code is responsible, but I'm pretty sure that the problem (and/or potential fix) is /should be in class Banzai::Filter::RepositoryLinkFilter.

Here's a new automated test that I believe reproduces the problem.

https://gitlab.com/das-g/gitlab/-/blob/cfd0be48ab8a5600e16e61c57e34bef53a84d5ac/spec/lib/banzai/filter/repository_link_filter_spec.rb#L259-264

Failures:
  1) Banzai::Filter::RepositoryLinkFilter with a valid commit rebuilds relative URL for an image with Umlaut in the repo
     Failure/Error:
       expect(doc.at_css('img')['src'])
         .to eq "/#{project_path}/-/raw/#{ref}/files/images/logo-bläck.png"
       expected: "/namespace263/project1130/-/raw/markdown/files/images/logo-bläck.png"
            got: "files/images/logo-bläck.png"
       (compared using ==)
     Shared Example Group: :valid_repository called from ./spec/lib/banzai/filter/repository_link_filter_spec.rb:375
     # ./spec/lib/banzai/filter/repository_link_filter_spec.rb:262:in `block (3 levels) in <top (required)>'
     # ./spec/spec_helper.rb:329:in `block (3 levels) in <top (required)>'
     # ./spec/support/sidekiq_middleware.rb:9:in `with_sidekiq_server_middleware'
     # ./spec/spec_helper.rb:320:in `block (2 levels) in <top (required)>'
     # ./spec/spec_helper.rb:316:in `block (3 levels) in <top (required)>'
     # ./spec/spec_helper.rb:316:in `block (2 levels) in <top (required)>'
  2) Banzai::Filter::RepositoryLinkFilter with a valid ref rebuilds relative URL for an image with Umlaut in the repo
     Failure/Error:
       expect(doc.at_css('img')['src'])
         .to eq "/#{project_path}/-/raw/#{ref}/files/images/logo-bläck.png"
       expected: "/namespace293/project1160/-/raw/markdown/files/images/logo-bläck.png"
            got: "files/images/logo-bläck.png"
       (compared using ==)
     Shared Example Group: :valid_repository called from ./spec/lib/banzai/filter/repository_link_filter_spec.rb:382
     # ./spec/lib/banzai/filter/repository_link_filter_spec.rb:262:in `block (3 levels) in <top (required)>'
     # ./spec/spec_helper.rb:329:in `block (3 levels) in <top (required)>'
     # ./spec/support/sidekiq_middleware.rb:9:in `with_sidekiq_server_middleware'
     # ./spec/spec_helper.rb:320:in `block (2 levels) in <top (required)>'
     # ./spec/spec_helper.rb:316:in `block (3 levels) in <top (required)>'
     # ./spec/spec_helper.rb:316:in `block (2 levels) in <top (required)>'
Finished in 13 minutes 33 seconds (files took 57.39 seconds to load)
2331 examples, 2 failures
Failed examples:
rspec './spec/lib/banzai/filter/repository_link_filter_spec.rb[1:12:15]' # Banzai::Filter::RepositoryLinkFilter with a valid commit rebuilds relative URL for an image with Umlaut in the repo
rspec './spec/lib/banzai/filter/repository_link_filter_spec.rb[1:13:15]' # Banzai::Filter::RepositoryLinkFilter with a valid ref rebuilds relative URL for an image with Umlaut in the repo
Edited Mar 11, 2021 by Raphael Das Gupta
Assignee Loading
Time tracking Loading