WIP: Initial sketch for "polaris batch" command.
What's here?
This MR sketches out initial work for a polaris batch
command, which is meant to:
- support non-interactive
fetch
andlearn
... - ...using a configuration file which supports global or per-satellite settings...
- ...and records (cough will record) the latest data that has been fetched...
- ...thus allowing a "fetch all the latest data and update our models automagically" workflow.
I do not propose to merge this work in this state; instead, this is to get people's opinions on what I'm proposing. Read on, and see if you like the direction this is going. If so, cool -- I'll finish this up, tidy things a lot, and submit a proper MR. If not, let me know why and let's figure out if there's a way to make it better.
How does it work?
Configuration is intended to be read from a configuration file. There's an example file included in the polaris/batch
directory:
# Settings in "DEFAULT" can be overridden in individual satellite settings.
[DEFAULT]
# By default, everything will be in placed under polaris_root_dir in per-satellite
# directories, named after the title of the section.
# Example:
# polaris_batch
# ├── lightsail2
# │ ├── cache -- fetched data and normalized frames goe here
# │ ├── graph -- graphs will be put here
# │ └── log -- where we record last fetched data, last run, etc
# └── lightsail2_new_learner
# ├── cache -- as above
# ├── graph
# └── log
polaris_root_dir = /tmp/polaris_batch
# Satellite section
# The title of the section is a "friendly" name.
[lightsail2]
# The "name" argumentis the name of the normalizer; it's the same argument
# that would be passed to "polaris fetch".
name = LightSail-2
# Perhaps you could have two different analyses done for the same satellite
#[lightsail2_new_learner]
# name = LightSail-2
# In the future we can have different learn arguments here as well
# learn_args = -l logistic_regression
# We could overwrite individual paths if we wanted to
# cache_dir = /home/aardvark/polaris/lightsail2_new_learner/cache
# graph_dir = /home/aardvark/polaris/lightsail2_new_learner/graph
# log_dir = /home/aardvark/polaris/lightsail2_new_learner/log
The polaris batch
command itself takes two arguments:
--config-file [path to config file]
-
--dry-run
to show what would be done; it will print out the various steps and commands, but won't run them.
Shortcomings, accusations and TODOs
-
This is not perfect code, and could do with a refactoring. However, I wanted to get it in front of people for feedback early.
-
Logging actions, and determining the date of last successful fetch, is not yet done.
-
Directory creation code does not (yet) completely work, and does not (yet) respect the
--dry-run
flag.