Skip to content

Resolve "Refactor the Node class to a factory function with Node subclass for each toplevel type"

Matthew Iadanza requested to merge 165-add-node-factory into main

Significant refactor of the nodes system. The old Node class is now a superclass with individual subclasses for each toplevel node type.

Node attributes:

  • name (str): The name of the file the node represents
  • toplevel_type (str): The toplevel node type EX 'ParticlesData'
  • toplevel_description (str): A description of the toplevel type
  • type (str): The full node type EX 'ParticlesData.star.relion.refined'
  • format (str): The node's generalised file type EX ('mrc' for 'mrc', 'mrcs' or 'map')
  • kwds (list[str]): Keywords associated with the node
  • output_from_process (Process): The Process object for process that created the file
  • input_for_processes_list (list): A list of Process objects for processes that use the node as an input
  • format_converter (Optional[GenNodeFormatConverter]): An object that defines the expected extensions for the node and how to validate them

All of the original Node class's attributes are preserved with the same names except ext which was replaced with format so most code shouldn't be affected.

The new Node superclass has methods for validating the node based on its keywords and a default results display generation for this type.

To create a new node the new function pipeliner.node_factory.create_node is used instead of instantiating a new Node object directly. It takes the same arguments as the old Node class so:

> node = create_node("my_file.txt", "LogFile", ["kwd1", "kwd2"])
> node.type
LogFile.txt.kwd1.kwd2
> node.format
txt
> node.toplevel_type
LogFile

Nodes are validated on creation but validation can be turned off by adding a do_validation=False arg. This is used when the pipeline is read because it will be too slow to re-validate every node every time a large pipeline is read and the nodes were already validated upon creation.

PipelinerJob has been updated with a new default create_results_display function. It uses the default_results_display method of each node in self.output_nodes and returns these ResultsDisplayObjects. So now jobs only need to redefine this function if they want more complex results displays or don't want to display all of the output nodes.

I added default_results_display methods for most of the node types but there are some that have too many possible formats IE: 'LogFile' so these default to a ResultsDisplayText that says 'This node type has no default display'

I didn't add keyword_validation methods to any of the node types yet. These could get very complex EX: 'if format=star and "relion" in the keywords the starfile must have x, y and z fields' and this method isn't really used yet. In the future it could be used to validate an InputNodeJobOption in the GUI

This MR doesn't include the node renaming agreed upon at the RAL hackathon - that will be done under a separate MR

Closes #165 (closed)

Edited by Matthew Iadanza

Merge request reports