Commit 1f2c7355 authored by Mike Ryan's avatar Mike Ryan

#58: Rename DataProperty interface to Property, DataRecord interface to Record.

parent 9a6aa901
Pipeline #58451097 passed with stage
in 2 minutes and 17 seconds
......@@ -10,6 +10,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
### Changed
- `EtlTask` now accepts its extract, key_map, and load components as object instances rather than constructing them from configuration.
- `DataProperty` interface renamed to `Property`, and `DataRecord` interface renamed to `Record`.
### Added
- The `Filter` interface has been added, to determine whether a DataRecord should be processed.
......
......@@ -2,7 +2,7 @@
This is the API reference documentation for Soong, generated from the code using [Doxygen](http://doxygen.nl/).
All components other than `DataProperty` and `DataRecord` take a keyed configuration array as their single constructor argument. The component interfaces all inherit from [ConfigurableComponent][d086dd58], and at the moment all concrete configurable component classes inherit from [OptionsResolverComponent][f40a8c73] which is based on [Symfony OptionsResolver][ca87104b]. All components using this base class must implement [optionDefinitions()][43def4f7] to define the configuration options they accept.
All components other than `Property` and `Record` take a keyed configuration array as their single constructor argument. The component interfaces all inherit from [ConfigurableComponent][d086dd58], and at the moment all concrete configurable component classes inherit from [OptionsResolverComponent][f40a8c73] which is based on [Symfony OptionsResolver][ca87104b]. All components using this base class must implement [optionDefinitions()][43def4f7] to define the configuration options they accept.
[d086dd58]: interface_soong_1_1_contracts_1_1_configuration_1_1_configurable_component.html "ConfigurableComponent"
[f40a8c73]: class_soong_1_1_configuration_1_1_options_resolver_component.html "OptionsResolverComponent"
......@@ -11,9 +11,9 @@ All components other than `DataProperty` and `DataRecord` take a keyed configura
As an ETL framework, the key components of Soong are of course:
- [Extractors][1b68bb10]: Extractors read data from a source data store and via `extract*()` methods produce iterators to deliver one record at a time as a `DataRecord` instance. They accept configuration to determine where and how to access the source data, including filters (see below) to control what records to process on a given invocation. Being able to tell how many source records are available for migration is very helpful, although on occasion there may be data sources where this is impossible (or at least very slow) - therefore, countability is not required by `Extractor`. Most extractors will want to implement `\Countable` (a `CountableExtractorBase` class is provided which should be a good starting point for most extractors).
- [Extractors][1b68bb10]: Extractors read data from a source data store and via `extract*()` methods produce iterators to deliver one record at a time as a `Record` instance. They accept configuration to determine where and how to access the source data, including filters (see below) to control what records to process on a given invocation. Being able to tell how many source records are available for migration is very helpful, although on occasion there may be data sources where this is impossible (or at least very slow) - therefore, countability is not required by `Extractor`. Most extractors will want to implement `\Countable` (a `CountableExtractorBase` class is provided which should be a good starting point for most extractors).
- [Transformers][f8e7b6dc]: A Transfomer class accepts a value (usually a property from an extractor-produced record) and produces a new value.
- [Loaders][d4c501b1]: Loaders accept one `DataRecord` instance at a time and load the data it contains into a destination as configured. Note that not all destinations may permit deleting loaded data (e.g., a loader could be used to output a CSV file). The deletion capability (used by rollback operations) should be moved to a separate interface.
- [Loaders][d4c501b1]: Loaders accept one `Record` instance at a time and load the data it contains into a destination as configured. Note that not all destinations may permit deleting loaded data (e.g., a loader could be used to output a CSV file). The deletion capability (used by rollback operations) should be moved to a separate interface.
[1b68bb10]: interface_soong_1_1_contracts_1_1_extractor_1_1_extractor.html "Extractor"
[f8e7b6dc]: interface_soong_1_1_contracts_1_1_transformer_1_1_transformer.html "Transformer"
......@@ -21,19 +21,19 @@ As an ETL framework, the key components of Soong are of course:
The ETL pipeline components need to communicate the data they handle with each other - extractor outputs need to pass through a series of transformers and ultimately into a loader. The canonical representation of such data would be an associative array of arbitrarily-typed values, but rather than require a specific representation it is more flexible to abstract the data.
- [DataProperty][81696853]: Represents a value (which could be a scalar, an array, or an object). Implementations of DataProperty should be immutable - the value should be set at construction time and may not subsequently be changed. The value may be any scalar, array, or object type - including `DataProperty`.
- [DataRecord][ba5fb4bd]: A data record (a set of named `DataProperty` instances) is represented by `DataRecord`. In the context of an ETL pipeline, an extractor will output a `DataRecord` to input to transformers, and the transformation process will populate another instance of `DataRecord` one property at a time to ultimately pass to a loader.
- [Property][81696853]: Represents a value (which could be a scalar, an array, or an object). Implementations of Property should be immutable - the value should be set at construction time and may not subsequently be changed. The value may be any scalar, array, or object type - including `Property`.
- [Record][ba5fb4bd]: A data record (a set of named `Property` instances) is represented by `Record`. In the context of an ETL pipeline, an extractor will output a `Record` to input to transformers, and the transformation process will populate another instance of `Record` one property at a time to ultimately pass to a loader.
[81696853]: interface_soong_1_1_contracts_1_1_data_1_1_data_property.html "DataProperty"
[ba5fb4bd]: interface_soong_1_1_contracts_1_1_data_1_1_data_record.html "DataRecord"
[81696853]: interface_soong_1_1_contracts_1_1_data_1_1_property.html "Property"
[ba5fb4bd]: interface_soong_1_1_contracts_1_1_data_1_1_record.html "Record"
To manage the migration process, we have:
- [Task][845d1aeb]: A named object controlling the execution of operations according to a set of configuration. Most tasks will be ETL tasks, designed to migrate data, but the overall migration process may require some non-ETL housekeeping tasks (like moving files around) - classes derived from `Task` rather than `EtlTask` can be used to incorporate these operations.
- [EtlTask][fd591c8f]: A Task specifically designed to perform operations on data using extractors, transformers, and loaders. The most important operation is `migrate`, which will:
1. Invoke an `Extractor` instance and iterate over its data set, retrieving one source `DataRecord` at a time.
2. Create a destination `DataRecord`, and for each property to be stored in this record, execute one or more `Transformer` instances to derive the destination property from source properties and configuration.
3. Pass the destination `DataRecord` to a `Loader` instance for final disposition.
1. Invoke an `Extractor` instance and iterate over its data set, retrieving one source `Record` at a time.
2. Create a destination `Record`, and for each property to be stored in this record, execute one or more `Transformer` instances to derive the destination property from source properties and configuration.
3. Pass the destination `Record` to a `Loader` instance for final disposition.
- [TaskPipeline][ec470e98]: Manages a list of Tasks.
[845d1aeb]: interface_soong_1_1_contracts_1_1_task_1_1_task.html "Task"
......@@ -43,7 +43,7 @@ To manage the migration process, we have:
Finally, we have:
- [KeyMap][8129d923]: Storage of the relationships between extracted and loaded records (based on the designated unique keys for each). This enables maintaining relationships between keyed records when the keys change during migration (as when loading into an auto-increment SQL table), as well as providing rollback and auditing capabilities. This component is optional - you may implement ETL processes without tracking the keys being processed.
- [Filter][1ea4455f]: A filter simply accepts a `DataRecord` and based on the record's property values and its own configuration, decides whether the record should be further processed. Filters may be configured in the base configuration of an extractor (to help define the canonical source data to be migrated), or injected at run time (to, say, process a single specific record for debugging).
- [Filter][1ea4455f]: A filter simply accepts a `Record` and based on the record's property values and its own configuration, decides whether the record should be further processed. Filters may be configured in the base configuration of an extractor (to help define the canonical source data to be migrated), or injected at run time (to, say, process a single specific record for debugging).
[8129d923]: interface_soong_1_1_contracts_1_1_key_map_1_1_key_map.html "KeyMap"
[1ea4455f]: interface_soong_1_1_contracts_1_1_filter_1_1_filter.html "Filter"
......@@ -11,7 +11,7 @@ arraytosql:
class: Soong\Task\EtlTask
# Configuration passed to the Task class at creation time.
configuration:
# DataRecord class the Task will create to hold destination properties.
# Record class the Task will create to hold destination properties.
record_class: Soong\Data\Record
# The KeyMap object stores the mappings from source record keys to
# destination record keys.
......@@ -33,7 +33,7 @@ arraytosql:
# array within its configuration.
class: Soong\Extractor\ArrayExtractor
configuration:
# The concrete DataRecord class we will return for each source record.
# The concrete Record class we will return for each source record.
data_record_class: Soong\Data\Record
# Within the source data, the unique key is named "id" and is an integer.
# The KeyMap uses this information to create a map table and populate it.
......
......@@ -6,7 +6,7 @@ namespace Soong\Contracts\Data;
/**
* Immutable data property.
*/
interface DataProperty
interface Property
{
/**
......
......@@ -6,7 +6,7 @@ namespace Soong\Contracts\Data;
/**
* Collection of named data properties.
*/
interface DataRecord
interface Record
{
/**
......@@ -18,25 +18,25 @@ interface DataRecord
public function toArray() : array;
/**
* Set a property from an existing DataProperty object.
* Set a property from an existing Property object.
*
* @param string $propertyName
* Name of the property to set.
* @param DataProperty $propertyValue
* @param Property $propertyValue
* Property value to set.
*/
public function setProperty(string $propertyName, DataProperty $propertyValue) : void;
public function setProperty(string $propertyName, Property $propertyValue) : void;
/**
* Retrieve a property value as a DataProperty.
* Retrieve a property value as a Property.
*
* @param string $propertyName
* Name of the property to get.
*
* @return DataProperty
* @return Property
* Value of the property.
*/
public function getProperty(string $propertyName) : DataProperty;
public function getProperty(string $propertyName) : Property;
/**
* Does the named property exist in this record?
......
......@@ -4,10 +4,10 @@ declare(strict_types=1);
namespace Soong\Contracts\Extractor;
use Soong\Contracts\Configuration\ConfigurableComponent;
use Soong\Contracts\Data\DataRecord;
use Soong\Contracts\Data\Record;
/**
* Extractors turn a data source into a series of DataRecords.
* Extractors turn a data source into a series of Records.
*/
interface Extractor extends ConfigurableComponent
{
......@@ -15,7 +15,7 @@ interface Extractor extends ConfigurableComponent
/**
* Iterator taking into account any applicable filtering.
*
* @return DataRecord[]
* @return Record[]
* One data record from the source being extracted.
*/
public function extractFiltered() : iterable;
......@@ -27,7 +27,7 @@ interface Extractor extends ConfigurableComponent
* in the raw source which are not part of the logical source for this
* extraction process.
*
* @return DataRecord[]
* @return Record[]
* One data record from the source being extracted.
*/
public function extractAll() : iterable;
......
......@@ -4,10 +4,10 @@ declare(strict_types=1);
namespace Soong\Contracts\Filter;
use Soong\Contracts\Configuration\ConfigurableComponent;
use Soong\Contracts\Data\DataRecord;
use Soong\Contracts\Data\Record;
/**
* Filters decide whether a DataRecord should or should not be processed.
* Filters decide whether a Record should or should not be processed.
*/
interface Filter extends ConfigurableComponent
{
......@@ -15,11 +15,11 @@ interface Filter extends ConfigurableComponent
/**
* Decide whether a data record should be processed.
*
* @param \Soong\Contracts\Data\DataRecord $dataRecord
* @param \Soong\Contracts\Data\Record $record
* Record to examine.
*
* @return bool
* TRUE if the record should be processed, FALSE if it should be skipped.
*/
public function filter(DataRecord $dataRecord) : bool;
public function filter(Record $record) : bool;
}
......@@ -4,10 +4,10 @@ declare(strict_types=1);
namespace Soong\Contracts\Loader;
use Soong\Contracts\Configuration\ConfigurableComponent;
use Soong\Contracts\Data\DataRecord;
use Soong\Contracts\Data\Record;
/**
* Loaders take one DataRecord at a time and load them into a destination.
* Loaders take one Record at a time and load them into a destination.
*/
interface Loader extends ConfigurableComponent
{
......@@ -15,10 +15,10 @@ interface Loader extends ConfigurableComponent
/**
* This needs to return disposition (success, failure) and key of result.
*
* @param DataRecord $data
* @param Record $data
* Data to be loaded into the destination.
*/
public function load(DataRecord $data) : void;
public function load(Record $data) : void;
/**
* List the properties available in records generated by this extractor.
......
......@@ -4,7 +4,7 @@ declare(strict_types=1);
namespace Soong\Contracts\Transformer;
use Soong\Contracts\Configuration\ConfigurableComponent;
use Soong\Contracts\Data\DataProperty;
use Soong\Contracts\Data\Property;
/**
* Accept a data property and turn it into another data property.
......@@ -15,11 +15,11 @@ interface Transformer extends ConfigurableComponent
/**
* Accept a data property and turn it into another data property.
*
* @param DataProperty $data
* @param Property $data
* Property containing data to be transformed.
*
* @return DataProperty
* @return Property
* The transformed data.
*/
public function transform(DataProperty $data) : DataProperty;
public function transform(Property $data) : Property;
}
......@@ -3,12 +3,10 @@ declare(strict_types=1);
namespace Soong\Data;
use Soong\Contracts\Data\DataProperty;
/**
* Immutable data property wrapper implementation.
*/
class Property implements DataProperty
class Property implements \Soong\Contracts\Data\Property
{
/**
......
......@@ -3,13 +3,12 @@ declare(strict_types=1);
namespace Soong\Data;
use Soong\Contracts\Data\DataProperty;
use Soong\Contracts\Data\DataRecord;
use Soong\Contracts\Data\Property;
/**
* Basic implementation of data records as arrays.
*/
class Record implements DataRecord
class Record implements \Soong\Contracts\Data\Record
{
/**
......@@ -17,7 +16,7 @@ class Record implements DataRecord
*
* Array of data properties, keyed by property name.
*
* @var DataProperty $data[]
* @var Property $data[]
*/
protected $data = [];
......@@ -30,15 +29,15 @@ class Record implements DataRecord
public function __construct(array $data = [])
{
foreach ($data as $propertyName => $propertyValue) {
// @todo Inject DataProperty implementation.
$this->setProperty($propertyName, new Property($propertyValue));
// @todo Inject Property implementation.
$this->setProperty($propertyName, new \Soong\Data\Property($propertyValue));
}
}
/**
* @inheritdoc
*/
public function setProperty(string $propertyName, DataProperty $propertyValue) : void
public function setProperty(string $propertyName, Property $propertyValue) : void
{
$this->data[$propertyName] = $propertyValue;
}
......@@ -46,7 +45,7 @@ class Record implements DataRecord
/**
* @inheritdoc
*/
public function getProperty(string $propertyName) : DataProperty
public function getProperty(string $propertyName) : Property
{
return isset($this->data[$propertyName]) ? $this->data[$propertyName] : $this->nullProperty();
}
......@@ -56,13 +55,13 @@ class Record implements DataRecord
*
* Provide a property with a null value.
*
* @return DataProperty
* @return Property
* A property object containing a null value.
*/
protected function nullProperty() : DataProperty
protected function nullProperty() : Property
{
// @todo Inject property implementation.
return new Property(null);
return new \Soong\Data\Property(null);
}
/**
......@@ -71,7 +70,7 @@ class Record implements DataRecord
public function toArray() : array
{
$result = [];
/** @var DataProperty $propertyValue */
/** @var Property $propertyValue */
foreach ($this->data as $propertyName => $propertyValue) {
if (!is_null($propertyValue)) {
$result[$propertyName] = $propertyValue->getValue();
......
......@@ -30,7 +30,7 @@ class ArrayExtractor extends CountableExtractorBase
public function extractAll() : iterable
{
foreach ($this->getConfigurationValue('data') as $data) {
// @todo: Inject DataRecord implementation.
// @todo: Inject Record implementation.
yield new Record($data);
}
}
......
......@@ -4,7 +4,7 @@ declare(strict_types=1);
namespace Soong\Extractor;
use League\Csv\Reader;
use Soong\Contracts\Data\DataRecord;
use Soong\Contracts\Data\Record;
/**
* CSV extractor based on The League CSV library.
......@@ -35,7 +35,7 @@ class Csv extends ExtractorBase
public function extractAll(): iterable
{
$csv = $this->loadCsv();
/** @var DataRecord $recordClass */
/** @var Record $recordClass */
$recordClass = $this->getConfigurationValue('record_class');
foreach ($csv->getRecords() as $record) {
yield new $recordClass($record);
......
......@@ -5,7 +5,7 @@ namespace Soong\Extractor;
use Doctrine\DBAL\DBALException;
use Doctrine\DBAL\FetchMode;
use Soong\Contracts\Data\DataRecord;
use Soong\Contracts\Data\Record;
/**
* Extractor for DBAL SQL queries.
......@@ -46,9 +46,9 @@ class DBAL extends CountableExtractorBase
/** @var \Doctrine\DBAL\Driver\Statement $statement */
$statement = $this->connection()->executeQuery($this->getConfigurationValue('query'));
while ($row = $statement->fetch(FetchMode::ASSOCIATIVE)) {
/** @var DataRecord $dataRecordClass */
$dataRecordClass = $this->getConfigurationValue('data_record_class');
yield new $dataRecordClass($row);
/** @var Record $recordClass */
$recordClass = $this->getConfigurationValue('data_record_class');
yield new $recordClass($row);
}
} catch (DBALException $e) {
// @todo
......
......@@ -57,17 +57,17 @@ abstract class ExtractorBase extends OptionsResolverComponent implements Extract
*/
public function extractFiltered() : iterable
{
foreach ($this->extractAll() as $dataRecord) {
foreach ($this->extractAll() as $record) {
$yield = true;
/** @var \Soong\Contracts\Filter\Filter $filter */
foreach ($this->getConfigurationValue('filters') as $filter) {
if (!$filter->filter($dataRecord)) {
if (!$filter->filter($record)) {
$yield = false;
break;
}
}
if ($yield) {
yield $dataRecord;
yield $record;
}
}
}
......
......@@ -4,7 +4,7 @@ declare(strict_types=1);
namespace Soong\Filter;
use Soong\Configuration\OptionsResolverComponent;
use Soong\Contracts\Data\DataRecord;
use Soong\Contracts\Data\Record;
use Soong\Contracts\Exception\UnrecognizedOperator;
use Soong\Contracts\Filter\Filter;
......@@ -59,12 +59,12 @@ class Select extends OptionsResolverComponent implements Filter
/**
* @inheritdoc
*/
public function filter(DataRecord $dataRecord): bool
public function filter(Record $record): bool
{
$criteria = $this->getConfigurationValue('criteria');
foreach ($criteria as $criterion) {
[$propertyName, $operator, $testValue] = $criterion;
$propertyValue = $dataRecord->getProperty($propertyName)->getValue();
$propertyValue = $record->getProperty($propertyName)->getValue();
switch ($operator) {
case '=':
case '==':
......
......@@ -3,7 +3,7 @@ declare(strict_types=1);
namespace Soong\Loader;
use Soong\Contracts\Data\DataRecord;
use Soong\Contracts\Data\Record;
use Soong\Data\Property;
/**
......@@ -26,7 +26,7 @@ class Csv extends LoaderBase
/**
* @inheritdoc
*/
public function load(DataRecord $data) : void
public function load(Record $data) : void
{
// @todo: Don't use concrete Property class.
$data->setProperty(
......
......@@ -4,7 +4,7 @@ declare(strict_types=1);
namespace Soong\Loader;
use Doctrine\DBAL\DBALException;
use Soong\Contracts\Data\DataRecord;
use Soong\Contracts\Data\Record;
use Soong\Data\Property;
/**
......@@ -35,7 +35,7 @@ class DBAL extends LoaderBase
/**
* @inheritdoc
*/
public function load(DataRecord $data) : void
public function load(Record $data) : void
{
try {
$this->connection()->insert(
......@@ -45,7 +45,7 @@ class DBAL extends LoaderBase
$id = $this->connection()->lastInsertId();
if ($id) {
$keyKeys = array_keys($this->getKeyProperties());
// @todo Inject DataProperty instance.
// @todo Inject Property instance.
$data->setProperty(reset($keyKeys), new Property($id));
}
} catch (DBALException $e) {
......
......@@ -3,7 +3,7 @@ declare(strict_types=1);
namespace Soong\Loader;
use Soong\Contracts\Data\DataRecord;
use Soong\Contracts\Data\Record;
/**
* Loader for testing/debugging pipelines.
......@@ -14,7 +14,7 @@ class PrintR extends LoaderBase
/**
* @inheritdoc
*/
public function load(DataRecord $data) : void
public function load(Record $data) : void
{
print_r($data);
}
......
......@@ -3,7 +3,7 @@ declare(strict_types=1);
namespace Soong\Loader;
use Soong\Contracts\Data\DataRecord;
use Soong\Contracts\Data\Record;
/**
* Loader for testing/debugging pipelines.
......@@ -14,7 +14,7 @@ class VarDump extends LoaderBase
/**
* @inheritdoc
*/
public function load(DataRecord $data) : void
public function load(Record $data) : void
{
var_dump($data);
}
......
......@@ -101,12 +101,12 @@ class EtlTask extends Task implements EtlTaskInterface
$extractor = $this->getExtractor($options);
$loader = $this->getLoader();
$keyMap = $this->getKeyMap();
/** @var \Soong\Contracts\Data\DataRecord $recordClass */
/** @var \Soong\Contracts\Data\Record $recordClass */
$recordClass = $taskConfiguration['record_class'];
/** @var \Soong\Contracts\Data\DataRecord $data */
/** @var \Soong\Contracts\Data\Record $data */
foreach ($extractor->extractFiltered() as $data) {
/** @var \Soong\Contracts\Data\DataRecord $resultData */
/** @var \Soong\Contracts\Data\Record $resultData */
$resultData = new $recordClass();
if (isset($taskConfiguration['transform'])) {
foreach ($taskConfiguration['transform'] as $property => $transformerList) {
......
......@@ -3,7 +3,7 @@ declare(strict_types=1);
namespace Soong\Transformer;
use Soong\Contracts\Data\DataProperty;
use Soong\Contracts\Data\Property;
/**
* Transformer to simply copy extracted data to the destination.
......@@ -14,8 +14,8 @@ class Copy extends TransformerBase
/**
* @inheritdoc
*/
public function transform(DataProperty $data) : DataProperty
public function transform(Property $data) : Property
{
return $data;
return clone $data;
}
}
......@@ -3,8 +3,7 @@ declare(strict_types=1);
namespace Soong\Transformer;
use Soong\Contracts\Data\DataProperty;
use Soong\Data\Property;
use Soong\Contracts\Data\Property;
/**
* Transformer to multiply the extracted data value by 2.
......@@ -15,9 +14,9 @@ class Double extends TransformerBase
/**
* @inheritdoc
*/
public function transform(DataProperty $data) : DataProperty
public function transform(Property $data) : Property
{
// @todo Don't use concrete class
return new Property(2 * $data->getValue());
return new \Soong\Data\Property(2 * $data->getValue());
}
}
......@@ -3,8 +3,7 @@ declare(strict_types=1);
namespace Soong\Transformer;
use Soong\Contracts\Data\DataProperty;
use Soong\Data\Property;
use Soong\Contracts\Data\Property;
/**
* Transformer to add 1 to the extracted data.
......@@ -15,9 +14,9 @@ class Increment extends TransformerBase
/**
* @inheritdoc
*/
public function transform(DataProperty $data) : DataProperty
public function transform(Property $data) : Property
{
// @todo Don't use concrete class
return new Property($data->getValue() + 1);
return new \Soong\Data\Property($data->getValue() + 1);
}
}
......@@ -3,8 +3,7 @@ declare(strict_types=1);
namespace Soong\Transformer;
use Soong\Data\Property;
use Soong\Contracts\Data\DataProperty;
use Soong\Contracts\Data\Property;
use Soong\Contracts\Task\TaskPipeline;
/**
......@@ -38,7 +37,7 @@ class KeyMapLookup extends TransformerBase
/**
* @inheritdoc
*/
public function transform(DataProperty $data) : DataProperty
public function transform(Property $data) : Property
{
if (!$data->isEmpty()) {
$keyMapConfig = $this->getConfigurationValue('key_map');
......@@ -52,10 +51,10 @@ class KeyMapLookup extends TransformerBase
if (!empty($loadedKey)) {
// @todo: Handle multi-value keys properly.
// @todo Don't use concrete class
return new Property(reset($loadedKey));