Skip to content

Each pipeline step should process individual records instead of a collection

Problem to solve

Currently Bulk Import pipelines can either pass the whole collection through the pipeline at once or record by record, depending on pipeline (e.g. SubGroupEntitiesPipeline vs EpicsPipeline). Pipelines that use GraphQL Extractor pass the whole collection through the pipeline, which was working great until we started to need record-by-record processing.

For example, in order to implement #297459 (closed) we need to pass individual records thought the pipeline, since during transformation phase we check if parent epic exists in database in order to link epic with it's parent. This can't be done when passing the whole collection at once (since no parent is persisted yet).

Proposed solution

  1. Create a new standardised ExtractedData response class to wrap response from GraphQL/REST APIs in order to have unified interface
  2. Update BulkImports::Common::Extractors::GraphqlExtractor to use newly created response class to return ExtractedData object instead of raw hash
  3. Update BulkImports::Common::Extractors::GraphqlExtractor and add new key_path option to replace HashKeyDigger.
  4. Remove HashKeyDigger transformer entirely, since there would be no need in it anymore.
  5. Make sure BulkImports::Pipeline::Runner yields individual records instead of the whole collection to transformers, so that by the time second record is being processed, first one is persisted.
Edited by George Koltsov