Skip to content
  • Matthew Iadanza's avatar
    Update to node naming system · 54c52a7e
    Matthew Iadanza authored
    The second term in a node description has been generalised.  `mymap.mol`, `mymap.mrc`, and `mymap.mrcs` would all get the node type `DensityMap.mrc` because they are all mrc files with different extensions. A `Node` no longer needs to specify its extension; it will now be automatically determined.
    
    **Direct changes:**
    
    Creating  a `Node` has been moved from `pipeliner.data_structure` to `pipeliner.node_factory`. A `Node` is created with `Node(filename, toplevel_type, [kwds])`
    
    
    old way:
    `Node("mymap.mrc", "DensityMap.mrc.relion.halfmap")`
    
    new way:
    `Node("mymap.mrc", "DensityMap", ["relion", "halfmap])`
    
    `InputNodeJobOption`, `FileNameJobOption`, and `MultiFileNameJobOption` now take the top-level node type for `node_type` and have an added `node_kwds` attribute that takes the list of keywords.
    
    
    All of the top-level nodetypes are defined as constants in `pipeliner.data_structure`.  I've made the effort to use the constants whenever a top-level type is used in the code to avoid issues with typos or confusion about the top-level names (IE MicrographMoviesData vs MicrographsMoviesData)
    
    **Background changes:**
    
    The `node_factory` checks the `nodetype_converter` for the specified top-level type and compares it to the extension of the file. 
    
    `nodetype_converter` for `DensityMap` top-level type:
    
    ```
        NODE_DENSITYMAP: (
            "A 3D cryoEM density map (could be half map, full map, sharpened etc)",
            GenNodeTypeConverter(
                toplevel_name="DensityMap",
                allowed_ext={("mrc", "mrcs", "map"): "mrc"},
                check_funct={"mrc": check_file_is_mrc},
            ),
        ),
    ``` 
    
    If the file extension is in one of the `tuple` keys for `allowed_ext` it is converted to the value for that key. Otherwise the extension is used as is.
    
    ```
    IE: 
    mymap.mrc -> DensityMap.mrc
    mymap.mrcs -> DensityMap.mrc
    mymap.xyz -> DensityMap.xyz
    ```
    
    if a `check_funct` exists for a specific extension that function is executed to check that the file is actually the type is claims to be. If the check function fails the ext in the node type is bracketed with `X` to denote that the file is not valid for the extension.  
    
    IE: `DensityMap.XmrcX`
    
    If the file does not exist yet, it is given a pass on this check.
    
    If the top-level node type is unknown to the pipeliner it takes everything at face value:
    
    IE:
    
    `Node("myfile.xyz", "FakeNodeType", ["kwd1", "kwd2"])`
    
    yields a `Node` of type `FakeNodeType.xyz.kwd1.kwd2`
    
    
    This closes #140 and sorts some of the issues raised in #131
    
    Closes #140
    
    !159
    54c52a7e