Allow callback for Url source, and single item Json plugin

Issue #3040427 on drupal.org by rollsd.

Problem

Much like the problem outlined in #2608610: Add support for list/item pattern, we have a need to import a bunch of items where the full list of those items (with minimal metadata like id) is defined in an endpoint like:

And each item can be retrieved in full as a single JSON object via:

I have looked into the solution on that issue, and it looks promising, however - the latest patches don't apply. And I feel like the idea of defining uri templates and child item selectors is adding complexity without addressing all use cases where we want to dynamically generate the source URLs.

Also - the item endpoint I am using provides just a single JSON object, not an array as the Json data parser assumes.

Proposed solution

Lets just allow the Url processor to take a callback which can do the work of generating the URLs in advance.

This would have other benefits (in addition to providing a way for you to implentment the ItemList Item pattern), for example, if you wish your list of source URLs to be paginated from a source, for example you are importing a catalog of 10,000 products and you want to do in batches of 200 (to balance memory limitations with the economy of making fewer HTTP requests) your callback could handle generating the urls. Each URL might looks like:

The other part of the solution would require a way to configure the migration so that it knows a source URL might contain a single object. We could explicitly set item_selector to FALSE to indicate this.

To get this working the source configuration would be something like:

source:
  plugin: url
  data_fetcher_plugin: http
  data_parser_plugin: json
  urls:
    callback: my_module_migrate_urls
  item_selector: false
  skip_count: true 
 

Note: skip_count: true is important if you have a large number of URLs, otherwise things like drush migrate-status will iterate through all requested URLs to get a total count.

The current migration is passed to the callback, this will allow some other tweaking, like ensuring only URLs we intend to import in the current migration are imported, like so:

use Drupal\migrate\Plugin\MigrationInterface
use Drupal\migrate\Plugin\MigrateIdMapInterface
function my_module_migrate_urls(MigrationInterface $migration) {
$ids = my_module_get_all_ids();
$urls = [];
  // Lets exclude URLs for items already marked as imported.
  $id_map = $migration->getIdMap();
  $imported_ids = $id_map->getDatabase()->select($id_map->mapTableName(), 'm')
     ->fields('m', ['sourceid1'])
     ->condition('source_row_status', MigrateIdMapInterface::STATUS_IMPORTED)
     ->execute()
     ->fetchCol();
  $ids = array_diff($ids, $imported_ids);

  foreach ($ids as $id) {
    $urls[] = "https://example.com/jsonapi/node/article/{$id}";
  }
  return $urls;

}