Commit 65ec86d8 authored by Dan Allen's avatar Dan Allen

merge !31

resolves #23 add content classifier architecture guidebook
parents 96ba1ddd 4b664e15
Pipeline #14553953 passed with stages
in 3 minutes and 35 seconds
:attachmentsdir: {moduledir}/assets/attachments
:fragmentsdir: {moduledir}/documents/_fragments
:imagesdir: {moduledir}/assets/images
:samplesdir: {moduledir}/samples
:moduledir: ..
= Content Classifier Architecture
== Context
Throughout Antora's documentation pipeline, components frequently need to retrieve files by group or look up a file at a specific path.
Therefore, these files need to be well organized in order for the components to process them efficiently.
This component provides that function.
The collection of files this component creates, which we call a [.term]_content catalog_, is the primary interface components will use to access the virtual files and their metadata, which includes source and destination paths.
The catalog should be structured for optimal querying.
== Functional Overview
The content classifier component is responsible for populating each file with metadata pertaining to where it's published, organizing the files for efficient processing, and putting the files into a content catalog.
It effectively transforms the content aggregate into a content catalog.
The content classifier requires the playbook and the content aggregate as input.
Using those inputs, it should carry out the following operations:
* Further partition the content aggregate into modules and families
** Build on the work the content aggregator did to organize the files by component version
* Reject files which are not in a recognized location
* Add additional metadata to the virtual files concerning information about their module, family, subpath, etc.
* Compute the output path (disk) and publish path (URL) information for each publishable file
//* Add a navigation index to the navigation files
* Create a look up mechanism to find files matching certain criteria
* Build a structured catalog of virtual files that can be queried and transmitted
== Software Architecture
The content classifier component functionality is provided by the content-classifier module.
The details of calculating file paths, assigning calculated metadata to files, sorting files by family for processing, and adding each file to a structured and transmittable content catalog should be encapsulated in this component.
The content classifier should:
* Access the content aggregate, which is produced by the content aggregator component
* Walk the files in each component version
** The file walker should be aware of the following divisions: module, family, and topic
* Only keep files that match a known family and family file format
* Discard files that do not match allowed project or file structures
* Calculate source, output, and published path information for each file
** Assign source metadata to each file (`src` property)
** Assign output metadata to each file (`out` property)
** Assign published metadata to each file (`pub` property)
*** Apply the URL extension style to published files (drop or indexify)
* Keep track of navigation order for each navigation file (`nav` property)
* Add each file to the content catalog
** The content catalog is an index of virtual file entries keyed by page ID
* Content aggregate (`aggregate`)
* Playbook (`urls.htmlExtensionStyle` and `site.url`)
* Content catalog (`catalog`)
== Code
The content classifier is implemented as a dedicated node package (i.e., module).
Its API exports the `classifyContent()` function, which accepts a playbook instance and the content aggregate data structure.
The API for the content classifier should be used as follows:
const classifyContent = require ('../packages/content-classifier/lib/index')
const catalog = classifyContent(playbook, aggregate)
The files in the catalog can be queried as follows:
const pages = catalog.findBy({ family: 'page' })
.Content classifier API
* `getFiles()`
* `addFile()`
* `findBy({...})` with filter abilities
** filterable properties: `component`, `version`, `module`, `family`, `subpath`, `stem`, and `basename`
* `getByID({...})`
== Data
The classifier assumes that each documentation component adheres to the following filesystem structure:
There must be one or more modules.
Files in the ROOT module are promoted a level above the named modules when published (effectively belonging to the component version itself).
AsciiDoc files are assumed to have the file extension `.adoc`.
Files and folders which begin with an underscore are not published.
The content catalog object (instance of `FileCatalog`) produced by this component should have a well defined, queryable index of virtual files.
Each virtual file in the content catalog should have the `src`, `out`, and `pub` properties fully populated.
The `src.origin` property information attached to each file should also be carried over from the `aggregate`.
Each virtual file object should include the following properties:
.src property
* `component`
* `version`
* `module`
* `family` (navigation, fragment, page, image, attachment, or sample)
* `topics`
* `moduleRootPath`
* `basename`
* `mediaType`
* `stem`
* `extname`
* `origin`
.out property
* `dirname`
* `basename`
* `path`
* `moduleRootPath`
* `rootPath`
.pub property
* `url`
* `absoluteUrl` (using the `site` property from the playbook)
* `rootPath`
== Consequences
The content classifier component is responsible for the fine-grained organization of the virtual files.
The classifier organizes the files and allows subsequent components to request a specific file by its page ID or other grouping, such as component version or family.
* All destination information for each file has been determined and assigned
* Files can be queried by component version and/or family so they can be processed in parallel
* No subsequent components should have to organize the files for processing
* The content catalog is transmittable
* Pages can now be found and processed
The next component in Antora`'s documentation pipeline is the page generator.
The page generator requires the catalog as an input and operates on the files in the `pages` family.
* xref:architecture-guidebook.adoc[Architecture Guidebook]
name: content-classifier
title: Content Classifier Component
version: master
- modules/ROOT/nav.adoc
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment