Commit 37428e78 authored by Sarah White's avatar Sarah White

align terms and structure with content aggregator

parent 7ca4a2a1
* xref:architecture-guidebook.adoc[Content Classifier Component]
= Content Classifier Component Guidebook
= Content Classifier Architecture
== Context
To process files efficiently they need to be well organized.
The documentation pipeline needs to access this data through certain views and be able to query by path.
This data, which we call a [.term]_content catalog_, tells the documentation pipeline what each file is, what each file's source and destination paths are, and how to navigate between the files.
Antora`'s documentation pipeline needs to access the file data through certain views and be able to query the files by path.
This data, which we call a [.term]_content catalog_, tells the documentation pipeline what each file is, what each file`'s source and destination paths are, and how to navigate between the files.
The catalog is structured for optimal path search and file processing.
== Functional Overview
The content classifier component needs to access the playbook and content corpus.
Using those inputs it needs to be able to execute the following operations:
The content classifier component is responsible for populating each file with metadata, organizing the files for efficient processing, and creating the content catalog.
* Further partition the corpus into modules and then families
* Add additional metadata to the virtual files concerning information about its module, family, subpath, etc.
* Add a navigation index to navigation files
The content classifier needs to access the playbook and content aggregate.
Using those inputs, it needs to be able to execute the following operations:
* Further partition the aggregate into modules and then families
* Add additional metadata to the virtual files concerning information about their module, family, subpath, etc.
* Add a navigation index to the navigation files
* Create a look up mechanism to find files matching certain criteria
* Build a structured catalog of virtual files that can be transmitted
== Software Architecture
The content classifier is the component responsible for populating each file with metadata, organizing the files for efficient processing, and creating the content catalog.
The content classifier component functionality is provided by the content-classifier module.
The details of calculating file paths, assigning calculated metadata to files, sorting files by family for processing, and adding each file to a structured and transmittable content catalog should be encapsulated in this component.
At this point the content corpus contains files that are grouped by component version.
The content classifier should:
* Access the aggregate, which is provided by the content aggregator component
* Walk each component version
** The walker should be aware of the following divisions: module, family, and topic
* Only keep files that match a known family and family file format
* Discard files that do not match allowed project or file structures
* Calculate each file source, output, and published path information
* Calculate source, output, and published path information for each file
** Assign source metadata to each file (`src` property)
** Assign output metadata to each file (`out` property)
** Assign published metadata to each file (`pub` property)
* Keep track of navigation order for each navigation file (`nav` property)
* Apply URL extension strategy to published files
* Apply a URL extension strategy to published files
* Add each file to the content catalog
** The content catalog is an index of virtual file entries keyed by page ID
.Inputs
* Content corpus
* Content aggregate (`aggregate`)
* Playbook (URL strategies)
.Output
* Content catalog
* Content catalog (`catalog`)
== Code
The content classifier component is implemented as a dedicated node package (i.e., module).
Its APIs should be exported so that they can be required using the `require` keyword in the documentation pipeline.
The content classifier is implemented as a dedicated node package (i.e., module).
Its API exports several functions, the main function being `fileCatalog()`, which accepts a content aggregate data structure and a playbook instance.
The API for the content classifier should be used as follows:
[source,js]
----
const fileCatalog = require ('../packages/content-classifier/lib/index')
//...
Public API of content catalog:
const catalog = await fileCatalog(playbook, aggregate)
----
The files in the catalog can be visited as follows:
[source,js]
----
Insert code snippet
----
.Content classifier API
* `getFiles()`
* `addFile()`
* `findBy({...})` with filter abilities
......@@ -63,10 +81,12 @@ Public API of content catalog:
== Data
The content catalog object (instance of `FileCatalog`) produced by this component should have a well defined, queryable index of virtual files.
The content catalog object (instance of `catalog`) produced by this component should have a well defined, queryable index of virtual files.
Each virtual file in the content catalog should have the `src`, `out`, and `pub` properties fully populated.
It also has the `src.origin` property carried over from the content aggregator component.
The `src.origin` property information attached to each file should also be carried over from the `aggregate`.
Each virtual file object should include the following properties:
.src property
* `component`
......@@ -96,12 +116,19 @@ It also has the `src.origin` property carried over from the content aggregator c
.pub property
* `url`
* `absoluteUrl` (using the site property from the playbook)
* `absoluteUrl` (using the `site` property from the playbook)
* `rootPath`
== Consequences
The content classifier component is responsible for the fine-grained organization of the virtual files.
The classifier organizes the files and allows subsequent components to request a specific file by its page ID or other grouping, such as component version or family.
* All destination information for each file has been determined and assigned
* Files can be queried by component version and/or family so they can be processed in parallel
* No subsequent components should have to organize the files for processing
* The content catalog is transmittable
* Pages can now be found and processed
The next component in Antora`'s documentation pipeline is the page generator.
The page generator requires the catalog as an input and operates on the files in the `pages` family.
* xref:architecture-guidebook.adoc[Content Classifier Architecture]
name: content-classifier
title: Content Classifier Component
title: Content Classifier
version: master
nav:
- modules/ROOT/nav.adoc
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment