Commit 1b4fe230 authored by Sarah White's avatar Sarah White

add content aggregator architecture guidebook

- set up devdocs root module
- set up devdocs project config files
parent 9d8600ff
:attachmentsdir: {moduledir}/assets/attachments
:fragmentsdir: {moduledir}/documents/_fragments
:imagesdir: {moduledir}/assets/images
:samplesdir: {moduledir}/samples
:moduledir: ..
= Content Aggregator Guidebook
- define documentation component
- define content corpus: coarse-grained body of work, a set of arrays of vinyl files grouped by location repo/branch
- complete sections
== Context
The documentation pipeline does not own any content files.
Instead, it needs to be able to retrieve all of the input files for the site from multiple repositories and branches and organize the files into a content corpus that preserves each file's origin.
== Functional Overview
// I'm trying to avoid defining more than the final output term here b/c otherwise this section would bulk up really quickly and then we might as well just delete it and go right into the software architecture section.
The content aggregator component accesses and reads the configuration options provided by a playbook.
Then it completes the following actions as they are specified in the playbook and produces a [.term]_content aggregate_.
// definition of term(light): content aggregate, see the data section for the heavy definition
The content aggregate is a transmittable collection of coarsely sorted virtual content files.
* Fetch the remote repositories
* Identify remote and local repository branches
* Find the [.term]_documentation component_ in each branch
// definition of term: documentation component
** The documentation component is a group of documentation files which are versioned together and share a common subject
* Find information about each documentation component version
* Collect all the files in each documentation component branch
* Group the files by documentation component version into a content aggregate
== Software Architecture
The content aggregator component functionality is provided by index.js, the entry point file for the component, and mime.js.
Mime.js associates the file extension .adoc with the internet media type text/asciidoc.
All the details of locating and cloning or fetching the repositories, identifying branches, finding documentation components, their information and their files, and grouping the files by documentation component version into a content aggregate should be encapsulated in the content aggregator component.
The content aggregator component should:
* Load a playbook
* Locate input repositories
* Use a git integration library (NodeGit) to clone the repositories the first time and fetch them subsequent times
* Work with private, public, bare, non-bare, ssh, https, and local directories that are git repositories (worktree)
** Repository origins could be GitHub, GitLab, Bitbucket, etc.
* Put cloned repositories into cache for subsequent runs
* Use local directories in place
* Scan all branches per repository to find playbook matches
** Local branches take precedence over remote branches with the same name
* Visit each matched branch and find the [.term]_component description file_ (antora.yml) at the root of each documentation component
* Walk each documentation component subtree and collect input files
* Create a virtual file for each collected file (vinyl)
* Assign a [.term]_component version key_ (`version@component`) to each virtual file
// definition of term: component version key
** The component version key is the primary association for a content file
** Each content file must be in exactly one component version
* Add origin metadata (`src` property) to each virtual file
* Sort files by component version key and build them into a content aggregate
** The content aggregate should be transmittable
* Parallelize input processing whenever possible and do a join once content aggregate is complete (map reduce)
* Playbook (`content.sources`)
// File aggregate, content aggregate, aggregate??? Either way, should align with classifier's File catalog, content catalog, catalog
* Content aggregate (`aggregate`)
== Code
The content aggregator component is implemented as a dedicated node package (i.e., module).
// Are there any public API methods that need to be introduced here?
== Data
// preliminary definition of term(heavy): content aggregate, see the overview section for the light definition
The content aggregate data structure produced by this component should index files by component version (`version@component`).
The component version information should be loaded from antora.yml files.
The location of antora.yml is the root of the file's path.
Each file should include the following properties:
.src property
* `basename`
* `mediaType`
* `stem`
* `extname`
* `origin`
== Consequences
The content aggregator component allows the documentation pipeline to work with content from multiple repositories and their branches.
This component enables the rest of the pipeline to work on virtual files.
* No other pipeline components need to know how to get the files from their repositories.
** All subsequent processing is done on the virtual file objects created by the content aggregator.
** While subsequent components don't interface with file origins, they do know where the files came from.
* Files are only coarsely sorted in the content aggregator.
** The content aggregator doesn't sort the files further because extensions should be allowed to easily contribute files without the component needing to recompute the data source.
* The next component in the Antora pipeline, the content classifier, is responsible for fine-grained organization.
** The classifier organizes the files and allows subsequent components to request a specific file by its page ID or other grouping, such as component version or family.
* xref:architecture-guidebook.adoc[Content Aggregator Architecture Guidebook]
name: content-aggregator
title: Content Aggregator Component
version: master
- modules/ROOT/nav.adoc
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment