Commit d6c66186 authored by Dan Allen's avatar Dan Allen Committed by Dan Allen

merge !37

resolves #26 add architecture guidebook for document converter
parents 450d95e1 9691e438
Pipeline #14678527 passed with stages
in 2 minutes and 42 seconds
= Document Converter Guidebook
== Context
In Antora, documentation pages are written in AsciiDoc.
Each non-partial, AsciiDoc document under the [.path]_pages_ folder gets mapped to a web page in the site.
The AsciiDoc must be converted to HTML so the contents of these documents can be displayed as a web page.
== Functional Overview
The files in the `page` family of the content catalog (i.e., text documents in the [.path]_pages_ directory) are written in AsciiDoc format.
The document converter should use an AsciiDoc processor (e.g., Asciidoctor.js) to convert these files to HTML.
These documents are used to populate the main area of the web page, so the AsciiDoc only needs to be converted to embeddable HTML.
That embeddable HTML is later enclosed in a web page template by the page generator component to produce the complete page.
Page metadata in the document header may be used by the template to control the presentation of auxiliary elements in the web page.
Therefore, that page metadata should be accessible to the template via the file object.
As part of converting the AsciiDoc to HTML, the document converter should also resolve references in the AsciiDoc.
Specifically, it should:
* resolve xref macros that specify page references and verify a match can be found in the content catalog
* add the appropriate path prefix to image references
* populate the attribute used in attachment references
* resolve the contents for include directives from partials and examples within the content catalog
When document conversion is complete, the `contents` property on the file object should be replaced with the embeddable HTML generated from the AsciiDoc, and the `asciidoc` property should capture the available attributes in the document header.
== Software Architecture
The document converter component functionality is provided by the document-converter module.
At a high level, this component is an AsciiDoc to HTML converter that operates on the files in the `page` family of the content catalog.
All the details of converting the AsciiDoc documents to HTML, such as resolving include directives and xref macros, should be encapsulated in this component.
Asciidoctor.js should be used to handle the conversion of AsciiDoc to HTML.
For each file in the `page` family, the document converter should:
* parse the AsciiDoc into an abstract syntax tree (AST)
** pass the value of the `contents` property of the file to the `Asciidoctor.load` API of Asciidoctor.js
** pass any options and attributes to `Asciidoctor.load` required for the document to work with Antora (see below)
** pass custom attributes defined by the user in the playbook (`asciidoc.attributes`)
* supply a custom include processor that resolves includes from the `partial` or `example` family in the content catalog
** the include processor should be an exportable function so it can be tested and reused
** if an include cannot be resolved, the generator should log a warning (meaning, don't be silent)
* supply a custom xref handler that resolves inter-page xrefs (which will be called during conversion)
** the xref handler should be an exportable function so it can be tested and reused
* retrieve the header attributes from the AST and assign them to the virtual file
** name the property `asciidoc.attributes`
* call convert on the AST
* assign the converted result to the `contents` property on the file
* discard the AST
=== AsciiDoc processor options and attributes
Here's a list of options that should be set when calling `Asciidoctor.load`:
* safe = 'safe'
Here's a list of default attributes that should be passed to `Asciidoctor.load`, but should be allowed to be overridden if defined in the playbook:
* source-highlighter = 'highlight.js'
* sectanchors = ''
* idprefix = ''
* idseparator = '-'
* icons = 'font'
Here's a list of mandatory attributes that should be set when calling `Asciidoctor.load`, which cannot be overridden:
* docname = file.src.stem
// Q: why not file.src.path?
* docfile = file.path
* docfilesuffix = file.src.extname
* env-site = ''
* imagesdir = file.out.moduleRootPath + '/_images'
* attachmentsdir = file.out.moduleRootPath + '/_attachments'
* examplesdir = file.src.moduleRootPath + '/examples'
* partialsdir = file.src.moduleRootPath + '/content/_partials'
=== xref handler
The xref handler should intercept the inline xref macro and handle any target that ends with `.adoc`.
The target is expected to have the following form:
<version@><component:><module:><topic/...><page>.adoc<#fragment>[<text>]
All parts except for the filepath (i.e., `<topic/...><page>.adoc`) are optional.
For example:
develop/best-practices.adoc
The filepath is relative to a module's [.path]_pages_ directory.
The `<topic/...>` part is only required if the page is located inside a subdirectory.
If any part is missing, the value from the current context should be consumed.
For example, if the component is not specified, the component of the referring page (the page containing the xref) should be assumed.
The xref should be translated into a link to the URL of the target page.
The URL of the target page should be retrieved from the `pub.url` of the target file.
That URL should be made relative to the current page's URL so that the link can be followed even when the site is not served through a web server.
If the target page cannot be resolved, a warning should be emitted.
=== Include processor
The target of all include directives should be resolved from the content catalog.
An include target must point to a file local to the current module, either in the `fragment` family or the `example` family.
For example:
include::{partialsdir}/partial.adoc[]
include::{examplesdir}/example.json[]
Due to current limitations in the AsciiDoc processor, this processor must handle the request to filter the included content by lines or tags.
=== Inputs and outputs
.Inputs
* Content catalog (`catalog`)
* `page` family from the content catalog
* Playbook (`asciidoc`)
.Output
* _none_ (mutates the files in the `page` family of the content catalog)
== Code
The document converter is implemented as a dedicated node package (i.e., module).
The document converter API exports the `convertDocuments()` function, which takes a playbook and a content catalog and converts all documents in the `page` family.
It also exports the `convertDocument()` function, which can be used to operate on a single file.
The API for the document converter should be used as follows:
// Q: should the convertDocuments return a collection of files which were converted?
[source,js]
----
const convertDocuments = require('../packages/document-converter/lib/index')
//...
convertDocuments(playbook, catalog)
----
Alternately, the pipeline can handle the conversion itself:
// TODO check this code
[source,js]
----
const convertDocument = require('../packages/document-converter/lib/convert-document')
//...
catalog.findBy({ family: 'page' }).forEach((file) => {
await convertDocument(file, playbook, catalog)
})
----
== Data
The document converter mutates the files in the `page` family of the content catalog, which can be retrieved by invoking `findBy({ family: 'page' })` on the content catalog.
Specifically, this component updates the value of the `contents` property by replacing the AsciiDoc with embeddable HTML.
The previous value is still accessible via the `history` property on the file.
It also assigns a hash of header attributes to the `asciidoc.attributes` property on the file object.
When converting each AsciiDoc document, this component incorporates global AsciiDoc attributes defined in the playbook (at the path `asciidoc.attributes`).
// Q: should it also incorporate attributes from antora.yml?
== Consequences
The document converter component allows text documents to be written in AsciiDoc.
Each non-partial, AsciiDoc document becomes a web page in the generated site.
This component converts the AsciiDoc to embeddable HTML, which is used as the contents of the main area of the web page.
As a result of invoking the main function for this component, the file contents of the files in the `page` family of the content catalog has been converted from AsciiDoc to embeddable HTML.
The contents of these files are now ready to be wrapped in a web page template and written to the generated site.
This component hands off processing to the page generator component, which wraps the embeddable HTML created by this component in a web page template to produce a complete web page.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment