Skip to content

Processing large CSV files and dealing with memory: add offset and limit

Problem / Motivation

If you're processing a large CSV file you may run into memory issues. Or, if your migration script processing uses a lot of memory on each row you may run into memory issues even on small files.

You can see https://www.drupal.org/project/migrate_tools/issues/2701121 for some discussion on memory issues with migrations.

You can also see a novel solution by Patrick Weston at Palantir here: https://www.palantir.net/blog/running-large-drupal-8-csv-migrations-batches.

The motivation for this ticket, however, is to explore whether the source plugin itself could have an offset and limit to allow it to operate similar to Drupal\migrate\Plugin\migrate\source\SqlBase which has batching built-in.

Solution

Follow the pattern from SqlBase to add an offset and limit (called "batch" and "batch_size") to the configuration options for the CSV source plugin.

Since MigrateExecutable and MigrateBatchExecutable implement a limit, the only thing that needs to be implemented here is an offset.

Edited by Joseph D. Purcell