This project is archived. Its data is read-only.

ArgumentError: invalid byte sequence in UTF-8 when parsing

Created by: walro

Hello,

I run into the problem in described in the title when parsing certain, unlucky, combinations of text. To reproduce:

irb(main):026:0> xml = "<name>&amp;#2013266165;&amp;#2013265920</name>"
=> "<name>&amp;#2013266165;&amp;#2013265920</name>"
irb(main):027:0> Oga.parse_xml(xml)
ArgumentError: invalid byte sequence in UTF-8
    from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/entities.rb:83:in `gsub'
    from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/entities.rb:83:in `decode'
    from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/entity_decoder.rb:14:in `decode'
    from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/entity_decoder.rb:5:in `try_decode'
    from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/text.rb:24:in `text'
    from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/character_node.rb:25:in `inspect'
    from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/node_set.rb:283:in `map'
    from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/node_set.rb:283:in `inspect'
    from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/element.rb:245:in `block in inspect'
    from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/element.rb:238:in `each'
    from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/element.rb:238:in `inspect'
    from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/node_set.rb:283:in `map'
    from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/node_set.rb:283:in `inspect'
    from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/document.rb:92:in `block in inspect'
    from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/document.rb:88:in `each'
    from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/document.rb:88:in `inspect'
    from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/lib/bundler/cli/console.rb:14:in `run'
    from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/lib/bundler/cli.rb:308:in `console'
    from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
    from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/lib/bundler/vendor/thor/lib/thor/invocation.rb:126:in `invoke_command'
    from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/lib/bundler/vendor/thor/lib/thor.rb:359:in `dispatch'
    from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/lib/bundler/vendor/thor/lib/thor/base.rb:440:in `start'
    from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/lib/bundler/cli.rb:10:in `start'
    from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/bin/bundle:20:in `block in <top (required)>'
    from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/lib/bundler/friendly_errors.rb:7:in `with_friendly_errors'
    from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/bin/bundle:18:in `<top (required)>'
    from /Users/robin/.gem/ruby/2.2.3/bin/bundle:23:in `load'
    from /Users/robin/.gem/ruby/2.2.3/bin/bundle:23:in `<main>'irb(main):028:0>

Line that raises: https://github.com/YorickPeterse/oga/blob/a938f23a0e5817b5924eff907d804ffaa23cfb8f/lib/oga/xml/entities.rb#L83

I understand that the text is bad and this might not really be fixable, but I think it would be great if a "better" error than ArgumentError could be raised.

Thanks!

Assignee Loading
Time tracking Loading