Hugh Brown requested to merge feature/issue-54-support-for-cron-rewrite into master Dec 14, 2019

What's here?

This MR sketches out initial work for a polaris batch command, which is meant to:

support non-interactive fetch and learn...
...using a configuration file which supports global or per-satellite settings...
...and records (cough will record) the latest data that has been fetched...
...thus allowing a "fetch all the latest data and update our models automagically" workflow.

I do not propose to merge this work in this state; instead, this is to get people's opinions on what I'm proposing. Read on, and see if you like the direction this is going. If so, cool -- I'll finish this up, tidy things a lot, and submit a proper MR. If not, let me know why and let's figure out if there's a way to make it better.

How does it work?

Configuration is intended to be read from a configuration file. There's an example file included in the polaris/batch directory:

# Settings in "DEFAULT" can be overridden in individual satellite settings.
[DEFAULT]
# By default, everything will be in placed under polaris_root_dir in per-satellite
# directories, named after the title of the section.
# Example:
# polaris_batch
# ├── lightsail2
# │   ├── cache -- fetched data and normalized frames goe here
# │   ├── graph -- graphs will be put here
# │   └── log   -- where we record last fetched data, last run, etc
# └── lightsail2_new_learner
#     ├── cache -- as above
#     ├── graph
#     └── log
polaris_root_dir = /tmp/polaris_batch

# Satellite section
# The title of the section is a "friendly" name.
[lightsail2]
# The "name" argumentis the name of the normalizer; it's the same argument
# that would be passed to "polaris fetch".
name = LightSail-2

# Perhaps you could have two different analyses done for the same satellite
#[lightsail2_new_learner]
# name = LightSail-2
# In the future we can have different learn arguments here as well
# learn_args = -l logistic_regression
# We could overwrite individual paths if we wanted to
# cache_dir = /home/aardvark/polaris/lightsail2_new_learner/cache
# graph_dir = /home/aardvark/polaris/lightsail2_new_learner/graph
# log_dir =  /home/aardvark/polaris/lightsail2_new_learner/log

The polaris batch command itself takes two arguments:

--config-file [path to config file]
--dry-run to show what would be done; it will print out the various steps and commands, but won't run them.

Shortcomings, accusations and TODOs

This is not perfect code, and could do with a refactoring. However, I wanted to get it in front of people for feedback early.
Logging actions, and determining the date of last successful fetch, is not yet done.
Directory creation code does not (yet) completely work, and does not (yet) respect the --dry-run flag.

WIP: Initial sketch for "polaris batch" command.

What's here?

How does it work?

Shortcomings, accusations and TODOs

Merge request reports