Commit 8768cf71 authored by Dan Allen's avatar Dan Allen

add architecture guidebook for document converter component

parent 450d95e1
= Document Converter Guidebook
== Context
In Antora, documentation articles (i.e., pages) are written in AsciiDoc.
Each non-partial AsciiDoc documents gets mapped to a web page in the site.
The AsciiDoc must be converted to HTML so the contents of the document can be displayed as a web page.
== Functional Overview
The files from the content catalog in the `page` family (i.e., documentation articles) are written in AsciiDoc format.
The document converter should use an AsciiDoc processor (e.g., Asciidoctor.js) to convert these files to HTML.
The documentation article populates the main area of the web page, so the AsciiDoc only needs to be converted to embeddable HTML.
That embeddable HTML gets enclosed in a web page template in a subsequent step to produce the complete page.
Page metadata in the document header may be used by the template to control the presentation of auxilary elements in the web page.
Therefore, that page metadata should also be accessible to the template via the file object.
As part of converting the AsciiDoc to HTML, the document converter should also resolve references in the AsciiDoc.
Specifically, it should:
* resolve xref macros that specify page references, verifying a match can be found in the content catalog
* add the appropriate path prefix to image references
* populate the attribute used in attachement references
* resolve the contents for include directives from partials and examples within the content catalog
When documentation conversion is complete, the contents property on the file object should be replaced with the embeddable HTML generated from the AsciiDoc and the asciidoc property should capture the attributes in the document header available.
== Software Architecture
The document converter component functionality is provided by the document-converter module.
At a high level, this component is an AsciiDoc to HTML converter that operates on the files from the content catalog in the `page` family.
All the details of converting the AsciiDoc documents to HTML, which includes resolving include directives and xref macros, should be encapsulated in this component.
Asciidoctor.js should be used to handle the conversion of AsciiDoc to HTML.
For each file in the `page` family, the document converter should:
* parse the AsciiDoc into an AST
** pass the value of the `contents` property of the file to the `Asciidoctor.load` API of Asciidoctor.js
** pass any options and attributes to `Asciidoctor.load` required for the document to work with Antora (see below)
** pass custom attributes defined by the user in the playbook (`asciidoc.attributes`)
* supply a custom include processor that resolves includes from the `partial` or `example` family in the content catalog
** the include processor should be an exportable function so it can be tested and reused
** if an include cannot be resolved, the generator should log a warning (meaning, don't be silent)
* supply a custom xref handler that resolves inter-page xrefs (which will be called during conversion)
** the xref handler should be an exportable function so it can be tested and reused
* retrieve the header attributes from the AST and assign them to the virtual file
** name the property `asciidoc.attributes`
* call convert on the AST
* assign the converted result to the `contents` property on the file
* discard the AST
=== AsciiDoc processor options and attributes
Here's a list of options that should be set when calling `Asciidoctor.load`:
* safe = 'safe'
Here's a list of default attributes that should be passed to `Asciidoctor.load`, but should be allowed to be overridden if defined in the playbook:
* source-highlighter = 'highlight.js'
* sectanchors = ''
* idprefix = ''
* idseparator = '-'
* icons = 'font'
Here's a list of mandatory attributes that should be set when calling `Asciidoctor.load`, which cannot be overridden:
* docname = file.src.stem
// Q: why not file.src.path?
* docfile = file.path
* docfilesuffix = file.src.extname
* env-site = ''
* imagesdir = file.out.moduleRootPath + '/_images'
* attachmentsdir = file.out.moduleRootPath + '/_attachments'
* examplesdir = file.src.moduleRootPath + '/examples'
* partialsdir = file.src.moduleRootPath + '/content/_partials'
=== xref handler
The xref handler should intercept the inline xref macro and handle any target that ends with `.adoc`.
The target is expected to have the following form:
All parts except for the filepath (i.e., `<topic/...><page>.adoc`) are optional.
For example:
The filepath is relative to the module's [.path]_pages_ directory.
The `<topic/...>` part is only required if the page is located inside a subdirectory.
If any part is missing, the value from the current context should be consumed.
For example, if the component is not specified, the component of the referring page (the page containing the xref) should be assumed.
The xref should be translated into a link to the URL of the target page (retrieved from the `pub.url` of the target file).
That URL should be made relative to the current page's URL so that the link can be followed even when the site is not served through a web server.
If the target page cannot be resolved, a warning should be emitted.
=== Include processor
The target of all include directives should be resolved from the content catalog.
Include target must point to a file local to the current module, either in the `fragment` family or the `example` family.
For example:
Due to current limitations in the AsciiDoc processor, this processor must handle the request to filter the included content by lines or tags.
=== Inputs and outputs
* Content catalog (`catalog`)
* `page` family from content catalog
* Playbook (`asciidoc`)
* _none_ (mutates the files from the `page` family in the content catalog)
== Code
The document converter is implemented as a dedicated node package (i.e., module).
The document converter API exports the `convertDocuments()` function, which takes a playbook and a content catalog and converts all documents in the `page` family.
It also exports the `convertDocument()` function, which can be used to operate on a single file.
The API for the document converter should be used as follows:
// Q: should the convertDocuments return a collection of files which were converted?
const convertDocuments = require('../packages/document-converter/lib/index')
convertDocuments(playbook, catalog)
Alternately, the pipeline can handle the conversion itself:
// TODO check this code
const convertDocument = require('../packages/document-converter/lib/convert-document')
catalog.findBy({ family: 'page' }).forEach((file) => {
await convertDocument(file, playbook, catalog)
== Data
The document converter mutates the files from the content catalog in the `page` family, which can be retrieved by invoking `findBy({ family: 'pages' })` on the content catalog.
Specifically, this component updates the value of the `contents` property by replacing the AsciiDoc with embeddable HTML.
The previous value is still accessible via the `history` property on the file.
It also assigns the hash of header attributes to the `asciidoc.attributes` property on the file object.
When converting each AsciiDoc document, this component incorporates global AsciiDoc attributes defined in the playbook (at the path `asciidoc.attributes`).
// Q: should it also incorporate attributes from antora.yml?
== Consequences
The document converter component allows documentation articles to be written in AsciiDoc.
Each documentation article (non-partial AsciiDoc document) becomes a web page in the generated site.
This component converts the AsciiDoc to embeddable HTML, which is used as the contents of the main area of the web page.
As a result of invoking the main function for this component, the file contents of the files from the content catalog in the `page` family has been converted from AsciiDoc to embeddable HTML.
The contents of these files are now ready to be wrapped in a web page template and written to the generated site.
This component hands off processing to the page generator component, which wraps the embeddable HTML created by this component in a web page template to produce a complete web page.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment