Resolve "Refactor the Node class to a factory function with Node subclass for each toplevel type" (!191) · Merge requests · ccpem / ccpem-pipeliner

Matthew Iadanza requested to merge 165-add-node-factory into main Mar 10, 2023

Significant refactor of the nodes system. The old Node class is now a superclass with individual subclasses for each toplevel node type.

Node attributes:

name (str): The name of the file the node represents
toplevel_type (str): The toplevel node type EX 'ParticlesData'
toplevel_description (str): A description of the toplevel type
type (str): The full node type EX 'ParticlesData.star.relion.refined'
format (str): The node's generalised file type EX ('mrc' for 'mrc', 'mrcs' or 'map')
kwds (list[str]): Keywords associated with the node
output_from_process (Process): The Process object for process that created the file
input_for_processes_list (list): A list of Process objects for processes that use the node as an input
format_converter (Optional[GenNodeFormatConverter]): An object that defines the expected extensions for the node and how to validate them

All of the original Node class's attributes are preserved with the same names except ext which was replaced with format so most code shouldn't be affected.

The new Node superclass has methods for validating the node based on its keywords and a default results display generation for this type.

To create a new node the new function pipeliner.node_factory.create_node is used instead of instantiating a new Node object directly. It takes the same arguments as the old Node class so:

> node = create_node("my_file.txt", "LogFile", ["kwd1", "kwd2"])
> node.type
LogFile.txt.kwd1.kwd2
> node.format
txt
> node.toplevel_type
LogFile

Nodes are validated on creation but validation can be turned off by adding a do_validation=False arg. This is used when the pipeline is read because it will be too slow to re-validate every node every time a large pipeline is read and the nodes were already validated upon creation.

PipelinerJob has been updated with a new default create_results_display function. It uses the default_results_display method of each node in self.output_nodes and returns these ResultsDisplayObjects. So now jobs only need to redefine this function if they want more complex results displays or don't want to display all of the output nodes.

I added default_results_display methods for most of the node types but there are some that have too many possible formats IE: 'LogFile' so these default to a ResultsDisplayText that says 'This node type has no default display'

I didn't add keyword_validation methods to any of the node types yet. These could get very complex EX: 'if format=star and "relion" in the keywords the starfile must have x, y and z fields' and this method isn't really used yet. In the future it could be used to validate an InputNodeJobOption in the GUI

This MR doesn't include the node renaming agreed upon at the RAL hackathon - that will be done under a separate MR

Closes #165 (closed)

Edited Mar 10, 2023 by Matthew Iadanza

Resolve "Refactor the Node class to a factory function with Node subclass for each toplevel type"

Merge request reports