Home.adoc 7.45 KB
Newer Older
1 2 3 4 5 6
= EPUB Parser

= {doctitle}

EPUB Parser gem parses EPUB 3 book loosely.

KitaitiMakoto's avatar
KitaitiMakoto committed
7 8 9 10 11 12 13 14 15 16
image:https://gitlab.com/KitaitiMakoto/epub-parser/badges/master/build.svg[link="https://gitlab.com/KitaitiMakoto/epub-parser/commits/master", title="pipeline status"]
image:https://gemnasium.com/KitaitiMakoto/epub-parser.png[link="https://gitlab.com/KitaitiMakoto/epub-parser/commits/master",title="Dependency Status"]
image:https://badge.fury.io/rb/epub-parser.svg[link="https://gemnasium.com/KitaitiMakoto/epub-parser",title="Gem Version"]
image:https://gitlab.com/KitaitiMakoto/epub-parser/badges/master/coverage.svg[link="https://kitaitimakoto.gitlab.io/epub-parser/coverage/",title="coverage report"]

* https://kitaitimakoto.gitlab.io/epub-parser/file.Home.html[Homepage]
* https://kitaitimakoto.gitlab.io/epub-parser/[Documentation]
* https://gitlab.com/KitaitiMakoto/epub-parser[Source Code]
* https://kitaitimakoto.gitlab.io/epub-parser/coverage/[Test Coverage]

17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
== Installation

    gem install epub-parser

== Usage

=== As command-line tools

==== epubinfo

`epubinfo` tool extracts and shows the metadata of specified EPUB book.

See {file:docs/Epubinfo.markdown}.

==== epub-open

`epub-open` tool provides interactive shell(IRB) which helps you research about EPUB book.

See {file:docs/EpubOpen.markdown}.

37 38 39 40 41 42
==== epub-cover

`epub-cover` tool extract cover image from EPUB book.

See {file:docs/EpubCover.adoc}.

43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
=== As a library

Use `EPUB::Parser.parse` at first:

----
require 'epub/parser'
    
book = EPUB::Parser.parse('/path/to/book.epub')
----

This book object can yield page by spine's order(spine defines the order to read that the author determines):

----
book.each_page_on_spine do |page|
  # do something...
end
----

`page` above is an {EPUB::Publication::Package::Manifest::Item} object and you can call {EPUB::Publication::Package::Manifest::Item#href #href} to see where is the page file:

----
book.each_page_on_spine do |page|
  file = page.href # => path/to/page/in/zip/archive
  html = Zip::Archive.open('/path/to/book.epub') {|zip|
    zip.fopen(file.to_s) {|file| file.read}
  }
end
----

And {EPUB::Publication::Package::Manifest::Item Item} provides syntax suger {EPUB::Publication::Package::Manifest::Item#read #read} for above:

----
html = page.read
doc = Nokogiri.HTML(html)
# do something with Nokogiri as always
----

For several utilities of Item, see {file:docs/Item.markdown} page.

By the way, although `book` above is a {EPUB::Book} object, all features are provided by {EPUB::Book::Features} module. Therefore YourBook class can include the features of {EPUB::Book::Features}:

----
require 'epub'

class YourBook < ActiveRecord::Base
    include EPUB::Book::Features
end

book = EPUB::Parser.parse(
  'uploaded-book.epub',
  :class => YourBook # *************** pass YourBook class
)
book.instance_of? YourBook # => true
book.required = 'value for required field'
book.save!
book.each_page_on_spine do |epage|
  page = YouBookPage.create(
    :some_attr    => 'some attr',
    :content      => epage.read,
    :another_attr => 'another attr'
  )
  book.pages << page
end
----

You are also able to find YourBook object for the first:

----
book = YourBook.find params[:id]
ret = EPUB::Parser.parse(
  'uploaded-book.epub',
  :book => book # ******************* pass your book instance
) # => book
ret == book # => true; this API is not good I feel... Welcome suggestion!
# do something with your book
----

120 121
==== Switching XML Library

122
EPUB Parser uses https://www.nokogiri.org/[Nokogiri], a Ruby bindings for http://xmlsoft.org/[Libxml2] and http://xmlsoft.org/XSLT/[Libxslt] and more if you have already installed Nokogiri gem by RubyGems or Bundler. If Nokogiri is not available, it fallbacks to https://ruby-doc.org/stdlib-2.5.3/libdoc/rexml/rdoc/index.html[REXML], a standard-bundled library. You can also specify REXML explicitly:
123 124

----
125
EPUB::Parser::XMLDocument.backend = :REXML
126 127
----

128 129
==== Switching ZIP library

130
EPUB Parser uses https://github.com/javanthropus/archive-zip[Archive::Zip], a pure Ruby ZIP library, by default. You can use https://bitbucket.org/winebarrel/zip-ruby/wiki/Home[Zip/Ruby], a Ruby bindings for https://libzip.org/[libzip] if you have already installed Zip/Ruby gem by RubyGems or Bundler.
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146

Globally:

----
EPUB::OCF::PhysicalContainer.adapter = :Zipruby
book = EPUB::Parser.parse("path/to/book.epub")
----

For each EPUB book:

----
book = EPUB::Parser.parse("path/to/book.epub", container_adapter: :Zipruby)
----

== Documentation

147 148
=== APIs

149 150
More documentations are avaiable in:

151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167
* {file:docs/Publication.markdown} includes document's meta data, file list and so on.
* {file:docs/Item.markdown} represents a file in EPUB package.
* {file:docs/FixedLayout.markdown} provides APIs to declare how EPUB reader renders in such as reflowable or fixed layout.
* {file:docs/Navigation.markdown} describes how to use Navigation Document.
* {file:docs/Searcher.markdown} introduces APIs to search words and elements, and search by EPUB CFIs(a position pointer for EPUB) from EPUB documents.
* {file:docs/UnpackedArchive.markdown} describes how to handle directories which was generated by unzip EPUB files instead of EPUB files themselves.
* {file:docs/MultipleRenditions.markdown} describes about EPUB Multiple-Rendistions Publication and APIs for that.

=== Examples

Example usages are listed in {file:Examples} page.

* {file:docs/AggregateContentsFromWeb.markdown Aggregate Contents From the Web}
* {file:examples/exctract-content-using-cfi.rb Extract contents from EPUB files using EPUB CFI(identifier for EPUB)}
* {file:examples/find-elements-and-cfis.rb Find elements and CFIs}

=== Building documentation
168

169
If you installed EPUB Parser via gem command, you can also generate documentaiton by your own(https://gitlab.com/KitaitiMakoto/rubygems-yardoc[rubygems-yardoc] gem is needed):
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230

----
$ gem install epub-parser
$ gem yardoc epub-parser
...
Files:          33
Modules:        20 (   20 undocumented)
Classes:        45 (   44 undocumented)
Constants:      31 (   31 undocumented)
Methods:       292 (   88 undocumented)
52.84% documented
YARD documentation is generated to:
/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc
----

It will show you path to generated documentation(`/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc` here) at the end.

Or, generating yardoc command is possible, too:

----
$ git clone https://gitlab.com/KitaitiMakoto/epub-parser.git
$ cd epub-parser
$ bundle install --path=deps
$ bundle exec rake doc:yard
...
Files:          33
Modules:        20 (   20 undocumented)
Classes:        45 (   44 undocumented)
Constants:      31 (   31 undocumented)
Methods:       292 (   88 undocumented)
52.84% documented
----

Then documentation will be available in `doc` directory.

== Requirements

* Ruby 2.2.0 or later

== History

See {file:CHANGELOG.adoc}.

== Note

This library is still in work.
Only a few features are implemented and APIs might be changed in the future.
Note that.

Currently implemented:

* container.xml of http://idpf.org/epub/30/spec/epub30-ocf.html#sec-container-metainf-container.xml[EPUB Open Container Format (OCF) 3.0]
* http://idpf.org/epub/30/spec/epub30-publications.html[EPUB Publications 3.0]
* EPUB Navigation Documents of http://www.idpf.org/epub/30/spec/epub30-contentdocs.html[EPUB Content Documents 3.0]
* http://www.idpf.org/epub/fxl/[EPUB 3 Fixed-Layout Documents]
* metadata.xml of http://www.idpf.org/epub/renditions/multiple/[EPUB Multiple-Rendition Publications]

== License

This library is distributed under the term of the MIT Licence.
See {file:MIT-LICENSE} file for more info.