Skip to content

Resolve "Improve invocation in a cluster environment (specifically: with slurm)"

Yunus Sevinchan requested to merge 91-distributed-multiverse into main

This MR implements a DistributedMultiverse class that can restore a Multiverse from the meta_config and re-runs this existing Multiverse. This is interesting in combination with a NoWorkTask that sets up the Multiverse but does not launch the run.

Details

class NoWorkTask(WorkerTask)

  • Spawns an inactive task
  • Update worker status / signal
  • Update streams
  • Debug flag to raise errors when forbidden functions are called
  • Tests
    • Run with a task that raises an error

class DistributedMultiverse(FrozenMultiverse)

  • Load meta_cfg from file
  • Forbidden to update / change meta_cfg
  • create Multiverse
  • run_selection(...)
  • run_all(..)
  • Prepare executable, possibly from backup
  • Update universe exists if any file (other than config.yml) in uni folder
  • create config for non-existing universes
  • --clear-existing option
  • --skip-existing option (Difficult to implement and run_selection(..) should not have a skip existing option. run(..) behaves as in Multiverse class.)

CLI utopya run MODEL --no-work

CLI utopya run-exisiting MODEL PATH --uni uni001 -u 002

  • Mark experimental / advanced feature

Anything to double-check?

  1. Can we "forbid" to change the meta config?
  2. Should we remove the run() function that DistributedMV' inherited from MV'?
  3. Can we imply --no-eval' when calling run MODEL --no-work'?
  4. Forbidden to update / change meta_cfg sufficient?
  5. Experimental feature annotation

Can this MR be accepted?

  • Implementation ready
  • Tests added or adjusted
  • Documentation extended or updated
  • Code quality

Related issues

Closes #91 (closed)

Edited by Yunus Sevinchan

Merge request reports