ArgumentError: invalid byte sequence in UTF-8 when parsing
Created by: walro
Hello,
I run into the problem in described in the title when parsing certain, unlucky, combinations of text. To reproduce:
irb(main):026:0> xml = "<name>&#2013266165;&#2013265920</name>"
=> "<name>&#2013266165;&#2013265920</name>"
irb(main):027:0> Oga.parse_xml(xml)
ArgumentError: invalid byte sequence in UTF-8
from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/entities.rb:83:in `gsub'
from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/entities.rb:83:in `decode'
from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/entity_decoder.rb:14:in `decode'
from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/entity_decoder.rb:5:in `try_decode'
from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/text.rb:24:in `text'
from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/character_node.rb:25:in `inspect'
from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/node_set.rb:283:in `map'
from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/node_set.rb:283:in `inspect'
from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/element.rb:245:in `block in inspect'
from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/element.rb:238:in `each'
from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/element.rb:238:in `inspect'
from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/node_set.rb:283:in `map'
from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/node_set.rb:283:in `inspect'
from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/document.rb:92:in `block in inspect'
from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/document.rb:88:in `each'
from /Users/robin/.gem/ruby/2.2.3/gems/oga-1.3.1/lib/oga/xml/document.rb:88:in `inspect'
from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/lib/bundler/cli/console.rb:14:in `run'
from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/lib/bundler/cli.rb:308:in `console'
from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/lib/bundler/vendor/thor/lib/thor/invocation.rb:126:in `invoke_command'
from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/lib/bundler/vendor/thor/lib/thor.rb:359:in `dispatch'
from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/lib/bundler/vendor/thor/lib/thor/base.rb:440:in `start'
from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/lib/bundler/cli.rb:10:in `start'
from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/bin/bundle:20:in `block in <top (required)>'
from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/lib/bundler/friendly_errors.rb:7:in `with_friendly_errors'
from /Users/robin/.gem/ruby/2.2.3/gems/bundler-1.10.6/bin/bundle:18:in `<top (required)>'
from /Users/robin/.gem/ruby/2.2.3/bin/bundle:23:in `load'
from /Users/robin/.gem/ruby/2.2.3/bin/bundle:23:in `<main>'irb(main):028:0>
Line that raises: https://github.com/YorickPeterse/oga/blob/a938f23a0e5817b5924eff907d804ffaa23cfb8f/lib/oga/xml/entities.rb#L83
I understand that the text is bad and this might not really be fixable, but I think it would be great if a "better" error than ArgumentError could be raised.
Thanks!