Fix packaging and some code.
Created by: pwnall
I read about this project today, and thought it's really cool. I'd like to play with academic papers for a toy project, and I'm sure many other PhD students would find this tool really useful.
I was very happy to see this is Ruby, and I was hoping I'd be able to use it inside a Ruby application. The code doesn't seem in shape for that quite yet. I made a few changes to get it closer.
- I moved the files in
lib/
tolib/pdf/extract
. Gems should generally not pollute the globalrequire
namespace. - I updated the paths inside all the files to point to the new location.
- I added the gems needed to run
test/catalog.rb
as development dependencies in the gemspec. - I created a simple
Gemfile
that references the gemspec, to make it easy to develop this gem and run its tests. - I moved
assign.rb
andtrain.rb
out ofbin/
and turned them into Rake tasks. It seems like they're used for development, so they shouldn't end up in the user's path when the gem is installed. - I updated the code to work with the newer
pdf-reader
API and updated the version number. This should fix #4 (closed). - I replaced libsvm-ruby-swig with rb-libsvm. The former crashed on my setup (ruby 2.0.0 on OSX 10.9) and hasn't been updated in 2 years. The latter has been updated this year, has a newer libsvm and, most importantly, doesn't crash Ruby.
I was able to run bin/pdf-extract
to extract titles and references from a PDF, and I was able to run rake assign PDF=.....
to build training data from a PDF file. I think this means that my libsvm code changes are correct.