Resolve "Refactor the Node class to a factory function with Node subclass for each toplevel type"
Significant refactor of the nodes system.
The old Node
class is now a superclass with individual subclasses for each toplevel node type.
Node
attributes:
- name (str): The name of the file the node represents
- toplevel_type (str): The toplevel node type EX 'ParticlesData'
- toplevel_description (str): A description of the toplevel type
- type (str): The full node type EX 'ParticlesData.star.relion.refined'
- format (str): The node's generalised file type EX ('mrc' for 'mrc', 'mrcs' or 'map')
- kwds (list[str]): Keywords associated with the node
- output_from_process (Process): The Process object for process that created the file
- input_for_processes_list (list): A list of Process objects for processes that use the node as an input
- format_converter (Optional[GenNodeFormatConverter]): An object that defines the expected extensions for the node and how to validate them
All of the original Node
class's attributes are preserved with the same names except ext
which was replaced with format
so most code shouldn't be affected.
The new Node
superclass has methods for validating the node based on its keywords and a default results display generation for this type.
To create a new node the new function pipeliner.node_factory.create_node
is used instead of instantiating a new Node
object directly. It takes the same arguments as the old Node
class so:
> node = create_node("my_file.txt", "LogFile", ["kwd1", "kwd2"])
> node.type
LogFile.txt.kwd1.kwd2
> node.format
txt
> node.toplevel_type
LogFile
Nodes are validated on creation but validation can be turned off by adding a do_validation=False
arg. This is used when the pipeline is read because it will be too slow to re-validate every node every time a large pipeline is read and the nodes were already validated upon creation.
PipelinerJob
has been updated with a new default create_results_display
function. It uses the default_results_display
method of each node in self.output_nodes
and returns these ResultsDisplayObjects
. So now jobs only need to redefine this function if they want more complex results displays or don't want to display all of the output nodes.
I added default_results_display
methods for most of the node types but there are some that have too many possible formats IE: 'LogFile' so these default to a ResultsDisplayText
that says 'This node type has no default display'
I didn't add keyword_validation
methods to any of the node types yet. These could get very complex EX: 'if format=star and "relion" in the keywords the starfile must have x, y and z fields' and this method isn't really used yet. In the future it could be used to validate an InputNodeJobOption
in the GUI
This MR doesn't include the node renaming agreed upon at the RAL hackathon - that will be done under a separate MR
Closes #165 (closed)