...
 
Commits (57)
^.*\.Rproj$
^\.Rproj\.user$
^README\.Rmd$
^README-.*\.png$
^\.travis\.yml$
^CONDUCT\.md$
^README\.html$
^cran-comments\.md$
^appveyor\.yml$
^docs$
.DS_Store
.Rproj.user
.Rhistory
.RData
cran-comments.md
language: R
sudo: required
cache: packages
r:
- oldrel
- release
- devel
# Contributor Code of Conduct
As contributors and maintainers of this project, we pledge to respect all people who
contribute through reporting issues, posting feature requests, updating documentation,
submitting pull requests or patches, and other activities.
We are committed to making participation in this project a harassment-free experience for
everyone, regardless of level of experience, gender, gender identity and expression,
sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.
Examples of unacceptable behavior by participants include the use of sexual language or
imagery, derogatory comments or personal attacks, trolling, public or private harassment,
insults, or other unprofessional conduct.
Project maintainers have the right and responsibility to remove, edit, or reject comments,
commits, code, wiki edits, issues, and other contributions that are not aligned to this
Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed
from the project team.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by
opening an issue or contacting one or more of the project maintainers.
This Code of Conduct is adapted from the Contributor Covenant
(http:contributor-covenant.org), version 1.0.0, available at
http://contributor-covenant.org/version/1/0/0/
Package: htmltidy
Title: Tidy Up and Test XPath Queries on HTML and XML Content
Version: 0.5.0
Encoding: UTF-8
[email protected]: c(
person("Bob", "Rudis", email = "[email protected]", role = c("aut", "cre")),
person("Dave", "Raggett", email = "[email protected]", role = c("ctb", "aut"),
comment="Original HTML Tidy library"),
person("Charles", "Reitzel", role = c("ctb", "aut"),
comment="Modern HTML Tidy library"),
person("Björn", "Höhrmann", role = c("ctb", "aut"), comment="HTML5 Support"),
person("Kenton","Russell", role = c("aut", "ctb"),
comment = "xml-viewer integration",
email = "[email protected]"),
person("Vadim", "Kiryukhin", role = c("ctb", "cph"),
comment = "vkbeautify library"),
person("Ivan", "Sagalaev", role = c("ctb", "cph"),
comment = "highlight.js library"),
person("Lev", "Muchnik", email = "[email protected]", role = c("ctb", "cph"),
comment = "xml-viewer library")
)
Maintainer: Bob Rudis <[email protected]>
Description: HTML documents can be beautiful and pristine. They can also be
wretched, evil, malformed demon-spawn. Now, you can tidy up that HTML and XHTML
before processing it with your favorite angle-bracket crunching tools, going beyond
the limited tidying that 'libxml2' affords in the 'XML' and 'xml2' packages and
taming even the ugliest HTML code generated by the likes of Google Docs and Microsoft
Word. It's also possible to use the functions provided to format or "pretty print"
HTML content as it is being tidied. Utilities are also included that make it
possible to view formatted and "pretty printed" HTML/XML
content from HTML/XML document objects, nodes, node sets and plain character HTML/XML
using 'vkbeautify' (by Vadim Kiryukhin) and 'highlight.js' (by Ivan Sagalaev).
Also (optionally) enables filtering of nodes via XPath or viewing an HTML/XML document
in "tree" view using 'XMLDisplay' (by Lev Muchnik). See
<https://github.com/vkiryukhin/vkBeautify> and
<http://www.levmuchnik.net/Content/ProgrammingTips/WEB/XMLDisplay/DisplayXMLFileWithJavascript.html>
for more information about 'vkbeautify' and 'XMLDisplay', respectively.
Copyright: file inst/COPYRIGHTS
URL: https://gitlab.com/hrbrmstr/htmltidy
BugReports: https://gitlab.com/hrbrmstr/htmltidy/issues
Depends:
R (>= 3.2.0)
License: MIT + file LICENSE
LazyData: true
NeedsCompilation: yes
Suggests:
testthat,
httr,
rvest
LinkingTo: Rcpp
Imports:
Rcpp,
xml2,
XML,
htmlwidgets,
htmltools
RoxygenNote: 6.1.1
YEAR: 2016
COPYRIGHT HOLDER: Bob Rudis
# Generated by roxygen2: do not edit by hand
S3method(tidy_html,HTMLInternalDocument)
S3method(tidy_html,character)
S3method(tidy_html,connection)
S3method(tidy_html,default)
S3method(tidy_html,raw)
S3method(tidy_html,response)
S3method(tidy_html,xml_document)
export(highlight_styles)
export(html_tree_view)
export(html_view)
export(renderXmltreeview)
export(renderXmlview)
export(tidy_html)
export(xml_tree_view)
export(xml_view)
export(xmltreeviewOutput)
export(xmlviewOutput)
import(XML)
import(htmltools)
import(htmlwidgets)
import(xml2)
importFrom(Rcpp,sourceCpp)
useDynLib(htmltidy, .registration=TRUE)
htmltidy 0.3.1
====================
* Fix warnings coming from URL redirection in examples
htmltidy 0.3.0
====================
* Better error handling (fixed crashing bug in #1)
* New option to display document errors
* Support for directly tidying httr::response objects
* Added XML/HTML viewer & XPath query widgets
htmltidy 0.2.0
====================
* Bundled tidy-html5 library with the package
* Windows compatibility
* Options handling
* Enabled generics
* Modified tests
htmltidy 0.1.0
====================
* Added a `NEWS.md` file to track changes to the package.
* Added Debian & Ubuntu compatibility
* Added basic error checking
* Added basic test harness
# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393
do_the_tidy <- function(source, options, show_errors) {
.Call(`_htmltidy_do_the_tidy`, source, options, show_errors)
}
#' Tidy Up and Test XPath Queries on HTML and XML Content
#'
#' HTML documents can be beautiful and pristine. They can also be
#' wretched, evil, malformed demon-spawn. Now, you can tidy up that HTML and XHTML
#' before processing it with your favorite angle-bracket crunching tools, going beyond
#' the limited tidying that 'libxml2' affords in the 'XML' and 'xml2' packages and
#' taming even the ugliest HTML code generated by the likes of Google Docs and Microsoft
#' Word. It's also possible to use the functions provided to format or "pretty print"
#' HTML content as it is being tidied. Utilities are also included that make it
#' possible to view formatted and "pretty printed" HTML/XML
#' content from HTML/XML document objects, nodes, node sets and plain character HTML/XML
#' using 'vkbeautify' (by Vadim Kiryukhin) and 'highlight.js' (by Ivan Sagalaev).
#' Also (optionally) enables filtering of nodes via XPath or viewing an XML document
#' in "tree" view using 'xml-viewer' (by Julian Gruber). See
#' \url{https://github.com/vkiryukhin/vkBeautify} and
#' \url{https://github.com/juliangruber/xml-viewer} for more information about 'vkbeautify'
#' and 'xml-viewer', respectively.
#'
#' @name htmltidy
#' @docType package
#' @author Bob Rudis ([email protected]@rud.is)
#' @importFrom Rcpp sourceCpp
#' @import xml2 XML htmlwidgets htmltools
#' @useDynLib htmltidy, .registration=TRUE
NULL
#' @export
#' @rdname tidy_html
tidy_html.response <- function(content, options=list(TidyXhtmlOut=TRUE),
verbose=FALSE) {
if (!grepl("html", content$headers[["content-type"]])) {
stop("htmltidy only parses HTML content from httr::response objects",
call.=FALSE)
}
html_txt <- suppressMessages(httr::content(content, as="text"))
tidy_html(html_txt)
}
#' Shiny bindings for xmltreeview
#'
#' Output and render functions for using xmltreeview within Shiny
#' applications and interactive Rmd documents.
#'
#' @param outputId output variable to read from
#' @param width,height Must be a valid CSS unit (like \\code{'100\%'},
#' \\code{'400px'}, \\code{'auto'}) or a number, which will be coerced to a
#' string and have \\code{'px'} appended.
#' @param expr An expression that generates a xmltreeview
#' @param env The environment in which to evaluate \\code{expr}.
#' @param quoted Is \\code{expr} a quoted expression (with \\code{quote()})? This
#' is useful if you want to save an expression in a variable.
#'
#' @name xmltreeview-shiny
#'
#' @export
xmltreeviewOutput <- function(outputId, width = '100%', height = '400px'){
htmlwidgets::shinyWidgetOutput(outputId, 'xmltreeview', width, height,
package = 'htmltidy')
}
#' @rdname xmltreeview-shiny
#' @export
renderXmltreeview <- function(expr, env = parent.frame(), quoted = FALSE) {
if (!quoted) { expr <- substitute(expr) } # force quoted
htmlwidgets::shinyRenderWidget(expr, xmltreeviewOutput, env, quoted = TRUE)
}
#' Widget output function for use in Shiny
#'
#' @param outputId outputId
#' @param width width
#' @param height height
#' @export
xmlviewOutput <- function(outputId, width = '100%', height = '400px'){
htmlwidgets::shinyWidgetOutput(outputId, 'xmlview', width, height,
package = 'htmltidy')
}
#' Widget render function for use in Shiny
#'
#' @param expr expr
#' @param env env
#' @param quoted quoted
#' @export
renderXmlview <- function(expr, env = parent.frame(), quoted = FALSE) {
if (!quoted) { expr <- substitute(expr) } # force quoted
htmlwidgets::shinyRenderWidget(expr, xmlviewOutput, env, quoted = TRUE)
}
#' Tidy or "Pretty Print" HTML/XHTML Documents
#'
#' Pass in HTML content as either plain or raw text or parsed objects (either with the
#' \code{XML} or \code{xml2} packages) or as an \code{httr} \code{response} object
#' along with an options list that specifies how the content will be tidied and get back
#' tidied content of the same object type as passed in to the function.
#'
#' The default option \code{TixyXhtmlOut} will convert the input content to XHTML.
#'
#' Currently supported options:
#'
#' \itemize{
#' \item{Ones taking a logical value: }{\code{TidyAltText}, \code{TidyBodyOnly}, \code{TidyBreakBeforeBR},
#' \code{TidyCoerceEndTags}, \code{TidyDropEmptyElems}, \code{TidyDropEmptyParas},
#' \code{TidyFixBackslash}, \code{TidyFixComments}, \code{TidyGDocClean}, \code{TidyHideComments},
#' \code{TidyHtmlOut}, \code{TidyIndentContent}, \code{TidyJoinClasses}, \code{TidyJoinStyles},
#' \code{TidyLogicalEmphasis}, \code{TidyMakeBare}, \code{TidyMakeClean}, \code{TidyMark},
#' \code{TidyOmitOptionalTags}, \code{TidyReplaceColor}, \code{TidyUpperCaseAttrs},
#' \code{TidyUpperCaseTags}, \code{TidyWord2000}, \code{TidyXhtmlOut}}
#' \item{Ones taking a character value: }{\code{TidyDoctype}, \code{TidyInlineTags}, \code{TidyBlockTags},
#' \code{TidyEmptyTags}, \code{TidyPreTags}}
#' \item{Ones taking an integer value: }{\code{TidyIndentSpaces}, \code{TidyTabSize}, \code{TidyWrapLen}}
#' }
#'
#' File \href{https://github.com/hrbrmstr/htmltidy/issues}{an issue} if there are other \code{libtidy}
#' options you'd like supported.
#'
#' It is likely that the most used options will be:
#'
#' \itemize{
#' \item{\code{TidyXhtmlOut} (logical)},
#' \item{\code{TidyHtmlOut} (logical)} and
#' \item{\code{TidyDocType} which should be one of "\code{omit}",
#' "\code{html5}", "\code{auto}", "\code{strict}" or "\code{loose}"}.
#' }
#'
#' You can clean up Microsoft Word (2000) and Google Docs HTML via logical settings for
#' \code{TidyWord2000} and \code{TidyGDocClean}, respectively.
#'
#' It may also be advantageous to remove all comments with \code{TidyHideComments}.
#'
#' @param content accepts a character vector, raw vector or parsed content from the \code{xml2}
#' or \code{XML} packages.
#' @param options named list of options
#' @param verbose output document errors? (default: \code{FALSE})
#' @note If document parsing errors are severe enough, \code{tidy_html()} will not be able
#' to clean the document and will display the errors (this output can be captured with
#' \code{sink()} or \code{capture.output()}) along with a warning and return a "best effort"
#' cleaned version of the document.
#' @return Tidied HTML/XHTML content. The object type will be the same as that of the input type
#' except when it is a \code{connection}, then a character vector will be returned.
#' @references \url{http://api.html-tidy.org/tidy/quickref_5.1.25.html} &
#' \url{https://github.com/htacg/tidy-html5/blob/master/include/tidyenum.h}
#' for definitions of the options supported above and \url{https://www.w3.org/People/Raggett/tidy/}
#' for an explanation of what "tidy" HTML is and some canonical examples of what it can do.
#' @export
#' @examples
#' opts <- list(
#' TidyDocType="html5",
#' TidyMakeClean=TRUE,
#' TidyHideComments=TRUE,
#' TidyIndentContent=TRUE,
#' TidyWrapLen=200
#' )
#'
#' txt <- paste0(
#' c("<html><head><style>p { color: red; }</style><body><!-- ===== body ====== -->",
#' "<p>Test</p></body><!--Default Zone --> <!--Default Zone End--></html>"),
#' collapse="")
#'
#' cat(tidy_html(txt, option=opts))
#'
#' \dontrun{
#' library(httr)
#' res <- GET("https://rud.is/test/untidy.html")
#'
#' # look at the original, un-tidy source
#' cat(content(res, as="text", encoding="UTF-8"))
#'
#' # see the tidied version
#' cat(tidy_html(content(res, as="text", encoding="UTF-8"),
#' list(TidyDocType="html5", TidyWrapLen=200)))
#'
#' # but, you could also just do:
#' cat(tidy_html(url("https://rud.is/test/untidy.html")))
#' }
tidy_html <- function(content, options=list(TidyXhtmlOut=TRUE), verbose=FALSE) {
UseMethod("tidy_html")
}
#' @export
#' @rdname tidy_html
tidy_html.default <- function(content, options=list(TidyXhtmlOut=TRUE),
verbose=FALSE) {
content <- paste0(content, collapse="")
.Call('_htmltidy_do_the_tidy', PACKAGE='htmltidy',
source=content, options=options, show_errors=verbose)
}
#' @export
#' @rdname tidy_html
tidy_html.character <- function(content, options=list(TidyXhtmlOut=TRUE),
verbose=FALSE) {
content <- paste0(content, collapse="")
.Call('_htmltidy_do_the_tidy', PACKAGE='htmltidy',
source=content, options=options, show_errors=verbose)
}
#' @export
#' @rdname tidy_html
tidy_html.raw <- function(content, options=list(TidyXhtmlOut=TRUE),
verbose=FALSE) {
content <- content[1]
content <- iconv(readBin(content, character()), to="UTF-8")
out <- .Call('_htmltidy_do_the_tidy', PACKAGE='htmltidy',
source=content, options=options, show_errors=verbose)
charToRaw(out)
}
#' @export
#' @rdname tidy_html
tidy_html.xml_document <- function(content, options=list(TidyXhtmlOut=TRUE),
verbose=FALSE) {
content <- toString(content)
out <- .Call('_htmltidy_do_the_tidy', PACKAGE='htmltidy',
source=content, options=options, show_errors=verbose)
xml2::read_html(out)
}
#' @export
#' @rdname tidy_html
tidy_html.HTMLInternalDocument <- function(content, options=list(TidyXhtmlOut=TRUE),
verbose=FALSE) {
content <- XML::saveXML(content)
out <- .Call('_htmltidy_do_the_tidy', PACKAGE='htmltidy',
source=content, options=options, show_errors=verbose)
XML::htmlParse(out)
}
#' @export
#' @rdname tidy_html
tidy_html.connection <- function(content, options=list(TidyXhtmlOut=TRUE),
verbose=FALSE) {
html <- paste0(readLines(content, warn=FALSE), collapse="")
close(content)
.Call('_htmltidy_do_the_tidy', PACKAGE='htmltidy',
source=html, options=options, show_errors=verbose)
}
#' HTML/XML tree viewer
#'
#' This uses the \code{xml-viewer} JavaScript module to provide a simple collapsible
#' tree viewer for HTML/XML documents, nodes, node sets and plain character
#' HTML/XML in an \code{htmlwidget} pane.
#'
#' @md
#' @param doc \code{xml2} document/node/nodeset, an \code{HTMLInternalDocument}/
#' \code{XMLInternalDocument} or atomic character vector of HTML/XML content
#' @param mode viewer mode. `traditional` uses tag notation; `modern` favors readability
#' oveer angle brackets.
#' @param scroll should the \code{<div>} holding the HTML/XML content scroll
#' (\code{TRUE}) or take up the full viewer/browser window (\code{FALSE}).
#' Default is \code{FALSE} (take up the full viewer/browser window). If
#' this is set to \code{TRUE}, \code{height} should be set to a value
#' other than \code{NULL}.
#' @param width widget \code{div} width
#' @param height widget \code{div} height
#' @note Large HTML or XML content may take some time to render properly. It is suggested
#' that this function be used on as minimal of a subset of HTML/XML as possible
#' or used in a browser context vs an IDE viewer context.
#' @export
#' @references \href{http://www.lexiconista.com/xonomy/}{xonomy xml viewer}
#' @examples
#' if(interactive()) {
#' txt <- paste0("<note><to>Tove</to><from>Jani</from><heading>Reminder</heading>",
#' "<body>Don't forget me this weekend!</body></note>")
#' # xml_tree_view(txt)
#' }
xml_tree_view <- function(doc=NULL, mode=c("traditional", "modern"), scroll=FALSE, width="100%", height=NULL) {
if (inherits(doc, "character")) {
doc <- paste0(doc, collapse="")
} else if (inherits(doc, "xml_nodeset")) {
doc <- paste0(as.character(doc), collapse="")
} else if (inherits(doc, "xml_document") | inherits(doc, "xml_node")) {
doc <- as.character(doc)
} else if (inherits(doc, "HTMLInternalDocument") |
inherits(doc, "XMLInternalDocument")) {
doc <- XML::saveXML(doc)
}
mode <- match.arg(trimws(tolower(mode)), c("traditional", "modern"))
mode <- unname(c("traditional"="nerd", "modern"="laic")[mode])
params <- list(
xmlDoc = doc,
mode = mode,
scroll = scroll
)
# create widget
htmlwidgets::createWidget(
name = 'xmltreeview',
x = params,
width = width,
height = height,
package = 'htmltidy'
)
}
#' @rdname xml_tree_view
#' @export
html_tree_view <- xml_tree_view
#' HTML/XML pretty printer and viewer
#'
#' This uses the \code{vkbeautify} and \code{highlight.js} javascript modules to format and
#' "pretty print" HTML/XML documents, nodes, node sets and plain character
#' HTML/XML in an \code{htmlwidget} pane.
#'
#' @param doc \code{xml2} document/node/nodeset, an \code{HTMLInternalDocument}/
#' \code{XMLInternalDocument} or atomic character vector of HTML/XML content
#' @param style CSS stylesheet to use (see \code{higlight_styles()})
#' @param scroll should the \code{<div>} holding the HTML/XML content scroll
#' (\code{TRUE}) or take up the full viewer/browser window (\code{FALSE}).
#' Default is \code{FALSE} (take up the full viewer/browser window). If
#' this is set to \code{TRUE}, \code{height} should be set to a value
#' other than \code{NULL}.
#' @param add_filter show an XPath input box to enable live filtering?
#' (default: \code{FALSE})
#' @param apply_xpath Add and apply an XPath query string to the view. If
#' \code{add_filter} is \code{TRUE} then this query string will
#' appear in the filter box and be applied to the passed in document.
#' @param width widget width (best to keep it at 100\%)
#' @param height widget height (kinda only useful for knitting since this is
#' meant to be an interactive tool).
#' @note Large HTML or XML content may take some time to render properly. It is suggested
#' that this function be used on as minimal of a subset of HTML/XML as possible
#' or used in a browser context vs an IDE viewer context.
#' @export
#' @references \href{https://highlightjs.org/}{highlight.js},
#' \href{http://www.eslinstructor.net/vkbeautify/}{vkbeautify}
#' @examples
#' if (interactive()) {
#' txt <- paste0("<note><to>Tove</to><from>Jani</from><heading>Reminder</heading>",
#' "<body>Don't forget me this weekend!</body></note>")
#' # xml_view(txt)
#' }
xml_view <- function(doc, style="default", scroll=FALSE, add_filter=FALSE,
apply_xpath = NULL, width="100%", height=NULL) {
xml_doc_name <- "doc"
if (!inherits(doc, "character") &
inherits(substitute(doc), "name")) {
xml_doc_name <- deparse(substitute(doc))
}
style <- trimws(tolower(style))
if (!style %in% highlight_styles()) {
style <- "default"
warning(sprintf("Style '%s' not found, using 'default'", style))
}
if (inherits(doc, "character")) {
doc <- paste0(doc, collapse="")
} else if (inherits(doc, "xml_nodeset")) {
doc <- paste0(as.character(doc), collapse="")
} else if (inherits(doc, "xml_document") | inherits(doc, "xml_node")) {
doc <- as.character(doc)
}
params <- list(
xmlDoc = doc,
styleSheet = style,
addFilter = add_filter,
applyXPath = apply_xpath,
scroll = scroll,
xmlDocName = xml_doc_name
)
htmlwidgets::createWidget(
name = 'xmlview',
x = params,
width = width,
height = height,
package = 'htmltidy'
)
}
#' @rdname xml_view
#' @export
html_view <- xml_view
#' List available HTML/XML highlight styles
#'
#' Returns a character vector of available style sheets to use when displaying
#' an XML document.
#'
#' @references See \url{https://highlightjs.org/static/demo/} for a demo of all
#' highlight.js styles
#' @export
#' @examples
#' highlight_styles()
highlight_styles <- function() {
gsub("\\.css$", "",
grep("\\.css$",
list.files(system.file("htmlwidgets/lib/highlightjs/styles", package="htmltidy")),
value=TRUE))
}
---
output: rmarkdown::github_document
editor_options:
chunk_output_type: console
---
```{r pkg-knitr-opts, include=FALSE}
hrbrpkghelpr::global_opts()
```
```{r badges, results='asis', echo=FALSE, cache=FALSE}
hrbrpkghelpr::stinking_badges()
```
```{r description, results='asis', echo=FALSE, cache=FALSE}
hrbrpkghelpr::yank_title_and_description()
```
Partly inspired by [this SO question](http://stackoverflow.com/questions/37061873/identify-a-weblink-in-bold-in-r) and because there's a great deal of cruddy HTML out there that needs fixing to use properly when scraping data.
It relies on a locally included version of [`libtidy`](http://www.html-tidy.org/) and works on macOS, Linux & Windows.
It also incorporates an `htmlwidget` to view and test XPath queries on HTML/XML content and another widget to view an XML document in a collapseable tree view.
## What's Inside The Tin
```{r ingredients, results='asis', echo=FALSE, cache=FALSE}
hrbrpkghelpr::describe_ingredients()
```
## Installation
```{r install-ex, results='asis', echo=FALSE, cache=FALSE}
hrbrpkghelpr::install_block()
```
## Usage
```{r usage}
library(htmltidy)
# current verison
packageVersion("htmltidy")
library(XML)
library(xml2)
library(httr)
library(purrr)
```
This is really "un-tidy" content:
```{r untidy-01}
res <- GET("https://rud.is/test/untidy.html")
cat(content(res, as="text"))
```
Let's see what `tidy_html()` does to it.
It can handle the `response` object directly:
```{r tidy-01}
cat(tidy_html(res, list(TidyDocType="html5", TidyWrapLen=200)))
```
But, you'll probably mostly use it on HTML you've identified as gnarly and already have that HTML text content handy:
```{r options-01}
cat(tidy_html(content(res, as="text"), list(TidyDocType="html5", TidyWrapLen=200)))
```
NOTE: you could also just have done:
```{r options-02}
cat(tidy_html(url("https://rud.is/test/untidy.html"),
list(TidyDocType="html5", TidyWrapLen=200)))
```
You'll see that this differs substantially from the mangling `libxml2` does (via `read_html()`):
```{r options-03}
pg <- read_html("https://rud.is/test/untidy.html")
cat(toString(pg))
```
It can also deal with "raw" and parsed objects:
```{r raw-01}
tidy_html(content(res, as="raw"))
tidy_html(content(res, as="text", encoding="UTF-8"))
tidy_html(content(res, as="parsed", encoding="UTF-8"))
tidy_html(suppressWarnings(htmlParse("https://rud.is/test/untidy.html")))
```
And, show the markup errors:
```{r errors-01}
invisible(tidy_html(url("https://rud.is/test/untidy.html"), verbose=TRUE))
```
## Testing Options
```{r more-options-01}
opts <- list(TidyDocType="html5",
TidyMakeClean=TRUE,
TidyHideComments=TRUE,
TidyIndentContent=FALSE,
TidyWrapLen=200)
txt <- "<html>
<head>
<style>
p { color: red; }
</style>
<body>
<!-- ===== body ====== -->
<p>Test</p>
</body>
<!--Default Zone
-->
<!--Default Zone End-->
</html>"
cat(tidy_html(txt, option=opts))
```
But, you're probably better off running it on plain HTML source.
Since it's C/C++-backed, it's pretty fast:
```{r speed-01}
book <- readLines("http://singlepageappbook.com/single-page.html")
sum(map_int(book, nchar))
system.time(tidy_book <- tidy_html(book))
```
(It's usually between 20 & 25 milliseconds to process those 202 kilobytes of HTML.) Not too shabby.
## htmltidy Metrics
```{r cloc, echo=FALSE}
cloc::cloc_pkg_md()
```
## Code of Conduct
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
This diff is collapsed.
# DO NOT CHANGE the "init" and "install" sections below
# Download script file from GitHub
init:
ps: |
$ErrorActionPreference = "Stop"
Invoke-WebRequest http://raw.github.com/krlmlr/r-appveyor/master/scripts/appveyor-tool.ps1 -OutFile "..\appveyor-tool.ps1"
Import-Module '..\appveyor-tool.ps1'
install:
ps: Bootstrap
# Adapt as necessary starting from here
build_script:
- travis-tool.sh install_deps
test_script:
- travis-tool.sh run_tests
on_failure:
- 7z a failure.zip *.Rcheck\*
- appveyor PushArtifact failure.zip
artifacts:
- path: '*.Rcheck\**\*.log'
name: Logs
- path: '*.Rcheck\**\*.out'
name: Logs
- path: '*.Rcheck\**\*.fail'
name: Logs
- path: '*.Rcheck\**\*.Rout'
name: Logs
- path: '\*_*.tar.gz'
name: Bits
- path: '\*_*.zip'
name: Bits
## Test environments
* local OS X install, R 3.6.1
* ubuntu 14.04 (on travis-ci), R 3.6.1
* win-builder (devel and release)
## R CMD check results
0 errors | 0 warnings | 1 note
* This is a new release.
<!-- Generated by pkgdown: do not edit by hand -->
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>License • htmltidy</title>
<!-- jquery -->
<script src="https://code.jquery.com/jquery-3.1.0.min.js" integrity="sha384-nrOSfDHtoPMzJHjVTdCopGqIqeYETSXhZDFyniQ8ZHcVy08QesyHcnOUpMpqnmWq" crossorigin="anonymous"></script>
<!-- Bootstrap -->
<link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script>
<!-- Font Awesome icons -->
<link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" integrity="sha384-T8Gy5hrqNKT+hzMclPo118YTQO6cYprQmhrYwIiQ/3axmI1hQomh7Ud2hPOy8SP1" crossorigin="anonymous">
<!-- pkgdown -->
<link href="pkgdown.css" rel="stylesheet">
<script src="jquery.sticky-kit.min.js"></script>
<script src="pkgdown.js"></script>
<!-- mathjax -->
<script src='https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'></script>
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<div class="container template-license">
<header>
<div class="navbar navbar-default navbar-fixed-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="index.html">htmltidy</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li>
<a href="reference/index.html">Reference</a>
</li>
<li>
<a href="news/index.html">News</a>
</li>
</ul>
<ul class="nav navbar-nav navbar-right">
<li>
<a href="https://github.com/hrbrmstr/htmltidy">
<span class="fa fa-github fa-lg"></span>
</a>
</li>
</ul>
</div><!--/.nav-collapse -->
</div><!--/.container -->
</div><!--/.navbar -->
</header>
<div class="row">
<div class="contents col-md-12">
<div class="page-header">
<h1>License</h1>
</div>
<pre>YEAR: 2016
COPYRIGHT HOLDER: Bob Rudis
</pre>
</div>
</div>
<footer>
<div class="copyright">
<p>Developed by Bob Rudis, Dave Raggett, Charles Reitzel, Björn Höhrmann, Kenton Russell.</p>
</div>
<div class="pkgdown">
<p>Site built with <a href="http://hadley.github.io/pkgdown/">pkgdown</a>.</p>
</div>
</footer>
</div>
</body>
</html>
<!-- Generated by pkgdown: do not edit by hand -->
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Authors • htmltidy</title>
<!-- jquery -->
<script src="https://code.jquery.com/jquery-3.1.0.min.js" integrity="sha384-nrOSfDHtoPMzJHjVTdCopGqIqeYETSXhZDFyniQ8ZHcVy08QesyHcnOUpMpqnmWq" crossorigin="anonymous"></script>
<!-- Bootstrap -->
<link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script>
<!-- Font Awesome icons -->
<link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" integrity="sha384-T8Gy5hrqNKT+hzMclPo118YTQO6cYprQmhrYwIiQ/3axmI1hQomh7Ud2hPOy8SP1" crossorigin="anonymous">
<!-- pkgdown -->
<link href="pkgdown.css" rel="stylesheet">
<script src="jquery.sticky-kit.min.js"></script>
<script src="pkgdown.js"></script>
<!-- mathjax -->
<script src='https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'></script>
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<div class="container template-authors">
<header>
<div class="navbar navbar-default navbar-fixed-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="index.html">htmltidy</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li>
<a href="reference/index.html">Reference</a>
</li>
<li>
<a href="news/index.html">News</a>
</li>
</ul>
<ul class="nav navbar-nav navbar-right">
<li>
<a href="https://github.com/hrbrmstr/htmltidy">
<span class="fa fa-github fa-lg"></span>
</a>
</li>
</ul>
</div><!--/.nav-collapse -->
</div><!--/.container -->
</div><!--/.navbar -->
</header>
<div class="row">
<div class="contents col-md-12">
<div class="page-header">
<h1>Authors</h1>
</div>
<ul class="list-unstyled">
<li>
<p><strong>Bob Rudis</strong>. Author, maintainer.
</p>
</li>
<li>
<p><strong>Dave Raggett</strong>. Contributor, author.
<br /><small>Original HTML Tidy library</small></p>
</li>
<li>
<p><strong>Charles Reitzel</strong>. Contributor, author.
<br /><small>Modern HTML Tidy library</small></p>
</li>
<li>
<p><strong>Björn Höhrmann</strong>. Contributor, author.
<br /><small>HTML5 Support</small></p>
</li>
<li>
<p><strong>Kenton Russell</strong>. Author, contributor.
<br /><small>xml-viewer integration</small></p>
</li>
<li>
<p><strong>Vadim Kiryukhin</strong>. Contributor, copyright&nbsp;holder.
<br /><small>vkbeautify library</small></p>
</li>
<li>
<p><strong>Ivan Sagalaev</strong>. Contributor, copyright&nbsp;holder.
<br /><small>highlight.js library</small></p>
</li>
<li>
<p><strong>Lev Muchnik</strong>. Contributor, copyright&nbsp;holder.
<br /><small>xml-viewer library</small></p>
</li>
</ul>
</div>
</div>
<footer>
<div class="copyright">
<p>Developed by Bob Rudis, Dave Raggett, Charles Reitzel, Björn Höhrmann, Kenton Russell.</p>
</div>
<div class="pkgdown">
<p>Site built with <a href="http://hadley.github.io/pkgdown/">pkgdown</a>.</p>
</div>
</footer>
</div>
</body>
</html>
This diff is collapsed.
/*
Sticky-kit v1.1.2 | WTFPL | Leaf Corcoran 2015 | http://leafo.net
*/
(function(){var b,f;b=this.jQuery||window.jQuery;f=b(window);b.fn.stick_in_parent=function(d){var A,w,J,n,B,K,p,q,k,E,t;null==d&&(d={});t=d.sticky_class;B=d.inner_scrolling;E=d.recalc_every;k=d.parent;q=d.offset_top;p=d.spacer;w=d.bottoming;null==q&&(q=0);null==k&&(k=void 0);null==B&&(B=!0);null==t&&(t="is_stuck");A=b(document);null==w&&(w=!0);J=function(a,d,n,C,F,u,r,G){var v,H,m,D,I,c,g,x,y,z,h,l;if(!a.data("sticky_kit")){a.data("sticky_kit",!0);I=A.height();g=a.parent();null!=k&&(g=g.closest(k));
if(!g.length)throw"failed to find stick parent";v=m=!1;(h=null!=p?p&&a.closest(p):b("<div />"))&&h.css("position",a.css("position"));x=function(){var c,f,e;if(!G&&(I=A.height(),c=parseInt(g.css("border-top-width"),10),f=parseInt(g.css("padding-top"),10),d=parseInt(g.css("padding-bottom"),10),n=g.offset().top+c+f,C=g.height(),m&&(v=m=!1,null==p&&(a.insertAfter(h),h.detach()),a.css({position:"",top:"",width:"",bottom:""}).removeClass(t),e=!0),F=a.offset().top-(parseInt(a.css("margin-top"),10)||0)-q,
u=a.outerHeight(!0),r=a.css("float"),h&&h.css({width:a.outerWidth(!0),height:u,display:a.css("display"),"vertical-align":a.css("vertical-align"),"float":r}),e))return l()};x();if(u!==C)return D=void 0,c=q,z=E,l=function(){var b,l,e,k;if(!G&&(e=!1,null!=z&&(--z,0>=z&&(z=E,x(),e=!0)),e||A.height()===I||x(),e=f.scrollTop(),null!=D&&(l=e-D),D=e,m?(w&&(k=e+u+c>C+n,v&&!k&&(v=!1,a.css({position:"fixed",bottom:"",top:c}).trigger("sticky_kit:unbottom"))),e<F&&(m=!1,c=q,null==p&&("left"!==r&&"right"!==r||a.insertAfter(h),
h.detach()),b={position:"",width:"",top:""},a.css(b).removeClass(t).trigger("sticky_kit:unstick")),B&&(b=f.height(),u+q>b&&!v&&(c-=l,c=Math.max(b-u,c),c=Math.min(q,c),m&&a.css({top:c+"px"})))):e>F&&(m=!0,b={position:"fixed",top:c},b.width="border-box"===a.css("box-sizing")?a.outerWidth()+"px":a.width()+"px",a.css(b).addClass(t),null==p&&(a.after(h),"left"!==r&&"right"!==r||h.append(a)),a.trigger("sticky_kit:stick")),m&&w&&(null==k&&(k=e+u+c>C+n),!v&&k)))return v=!0,"static"===g.css("position")&&g.css({position:"relative"}),
a.css({position:"absolute",bottom:d,top:"auto"}).trigger("sticky_kit:bottom")},y=function(){x();return l()},H=function(){G=!0;f.off("touchmove",l);f.off("scroll",l);f.off("resize",y);b(document.body).off("sticky_kit:recalc",y);a.off("sticky_kit:detach",H);a.removeData("sticky_kit");a.css({position:"",bottom:"",top:"",width:""});g.position("position","");if(m)return null==p&&("left"!==r&&"right"!==r||a.insertAfter(h),h.remove()),a.removeClass(t)},f.on("touchmove",l),f.on("scroll",l),f.on("resize",
y),b(document.body).on("sticky_kit:recalc",y),a.on("sticky_kit:detach",H),setTimeout(l,0)}};n=0;for(K=this.length;n<K;n++)d=this[n],J(b(d));return this}}).call(this);
<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 19.2.1, SVG Export Plug-In . SVG Version: 6.00 Build 0) -->
<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
viewBox="0 0 20 20" style="enable-background:new 0 0 20 20;" xml:space="preserve">
<style type="text/css">
.st0{fill:#75AADB;}
</style>
<path class="st0" d="M4,11.3h1.3v1.3H4c-2,0-4-2.3-4-4.7s2.1-4.7,4-4.7h5.3c1.9,0,4,2.3,4,4.7c0,1.9-1.2,3.6-2.7,4.3v-1.5
C11.4,10.2,12,9.1,12,8c0-1.7-1.4-3.3-2.7-3.3H4C2.7,4.7,1.3,6.3,1.3,8S2.7,11.3,4,11.3z M16,7.3h-1.3v1.3H16c1.3,0,2.7,1.6,2.7,3.3
s-1.4,3.3-2.7,3.3h-5.3C9.4,15.3,8,13.7,8,12c0-1.1,0.6-2.2,1.3-2.8V7.7C7.9,8.4,6.7,10.1,6.7,12c0,2.4,2.1,4.7,4,4.7H16
c1.9,0,4-2.3,4-4.7S18,7.3,16,7.3z"/>
</svg>
<!-- Generated by pkgdown: do not edit by hand -->
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>All news. htmltidy</title>
<!-- jquery -->
<script src="https://code.jquery.com/jquery-3.1.0.min.js" integrity="sha384-nrOSfDHtoPMzJHjVTdCopGqIqeYETSXhZDFyniQ8ZHcVy08QesyHcnOUpMpqnmWq" crossorigin="anonymous"></script>
<!-- Bootstrap -->
<link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script>
<!-- Font Awesome icons -->
<link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" integrity="sha384-T8Gy5hrqNKT+hzMclPo118YTQO6cYprQmhrYwIiQ/3axmI1hQomh7Ud2hPOy8SP1" crossorigin="anonymous">
<!-- pkgdown -->
<link href="../pkgdown.css" rel="stylesheet">
<script src="../pkgdown.js"></script>
<!-- mathjax -->
<script src='https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'></script>
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<div class="container">
<header>
<div class="navbar navbar-default navbar-fixed-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-bran