Each pipeline step should process individual records instead of a collection
Problem to solve
Currently Bulk Import pipelines can either pass the whole collection through the pipeline at once or record by record, depending on pipeline (e.g. SubGroupEntitiesPipeline vs EpicsPipeline). Pipelines that use GraphQL Extractor pass the whole collection through the pipeline, which was working great until we started to need record-by-record processing.
For example, in order to implement #297459 (closed) we need to pass individual records thought the pipeline, since during transformation phase we check if parent epic exists in database in order to link epic with it's parent. This can't be done when passing the whole collection at once (since no parent is persisted yet).
Proposed solution
- Create a new standardised
ExtractedDataresponse class to wrap response from GraphQL/REST APIs in order to have unified interface - Update
BulkImports::Common::Extractors::GraphqlExtractorto use newly created response class to returnExtractedDataobject instead of raw hash - Update
BulkImports::Common::Extractors::GraphqlExtractorand add newkey_pathoption to replaceHashKeyDigger. - Remove
HashKeyDiggertransformer entirely, since there would be no need in it anymore. - Make sure
BulkImports::Pipeline::Runneryields individual records instead of the whole collection to transformers, so that by the time second record is being processed, first one is persisted.
Edited by George Koltsov