Each pipeline step should process individual records instead of a collection
Problem to solve
Currently Bulk Import pipelines can either pass the whole collection through the pipeline at once or record by record, depending on pipeline (e.g. SubGroupEntitiesPipeline vs EpicsPipeline). Pipelines that use GraphQL Extractor pass the whole collection through the pipeline, which was working great until we started to need record-by-record processing.
For example, in order to implement #297459 (closed) we need to pass individual records thought the pipeline, since during transformation phase we check if parent epic exists in database in order to link epic with it's parent. This can't be done when passing the whole collection at once (since no parent is persisted yet).
Proposed solution
- Create a new standardised
ExtractedData
response class to wrap response from GraphQL/REST APIs in order to have unified interface - Update
BulkImports::Common::Extractors::GraphqlExtractor
to use newly created response class to returnExtractedData
object instead of raw hash - Update
BulkImports::Common::Extractors::GraphqlExtractor
and add newkey_path
option to replaceHashKeyDigger
. - Remove
HashKeyDigger
transformer entirely, since there would be no need in it anymore. - Make sure
BulkImports::Pipeline::Runner
yields individual records instead of the whole collection to transformers, so that by the time second record is being processed, first one is persisted.
Edited by George Koltsov