New process plugin to extract a given xpath from a DOMDocument
Issue #3112559 on drupal.org by marvil07.
Problem/Motivation
In the context of dom process plugins, the general idea is to manipulate html before storing it into Drupal.
One possible use case is to extract information from a given part of the markup, and ignore the rest.
Last year at NEDCamp I suggested to write something to do this as someone was telling me his use case.
Lately I found myself needing something like this.
A nice related note: the current dom migrate process plugins inside migrate plus are more generic than html processing, since it is handling a \DOMDocument
object, which can also be generated from other sources like xml; so it is useful not only for html processing.
Proposed resolution
Create a new migrate process plugin that can receive a \DOMDocument
, e.g. generated with the dom
plugin, and given some options extract a subset of the data and return it as a simple string.
Examples
A couple of examples.
process:
'body/value':
-
plugin: dom
method: import
source: 'body/0/value'
-
plugin: dom_extract
xpath: '/div/div'
This will extract the contents of the second nested <div>
element.
process:
field_data_type:
-
plugin: dom
method: import
source: 'body/0/value'
-
plugin: dom_extract
xpath: '/h1'
target: 'attribute:data-type'
This will extract the contents of the data-type attribute on the top level h1
element.
Remaining tasks
Provide patch.
Review.
User interface changes
N.A.
API changes
New dom_extract
migrate process plugin.
Data model changes
N.A.
Release notes snippet
New dom_extract
migrate process plugin to retrieve a subset of a given dom as a string.