Skip to content

WIP: Initial sketch for "polaris batch" command.

Hugh Brown requested to merge feature/issue-54-support-for-cron-rewrite into master

What's here?

This MR sketches out initial work for a polaris batch command, which is meant to:

  • support non-interactive fetch and learn...
  • ...using a configuration file which supports global or per-satellite settings...
  • ...and records (cough will record) the latest data that has been fetched...
  • ...thus allowing a "fetch all the latest data and update our models automagically" workflow.

I do not propose to merge this work in this state; instead, this is to get people's opinions on what I'm proposing. Read on, and see if you like the direction this is going. If so, cool -- I'll finish this up, tidy things a lot, and submit a proper MR. If not, let me know why and let's figure out if there's a way to make it better.

How does it work?

Configuration is intended to be read from a configuration file. There's an example file included in the polaris/batch directory:

# Settings in "DEFAULT" can be overridden in individual satellite settings.
[DEFAULT]
# By default, everything will be in placed under polaris_root_dir in per-satellite
# directories, named after the title of the section.
# Example:
# polaris_batch
# ├── lightsail2
# │   ├── cache -- fetched data and normalized frames goe here
# │   ├── graph -- graphs will be put here
# │   └── log   -- where we record last fetched data, last run, etc
# └── lightsail2_new_learner
#     ├── cache -- as above
#     ├── graph
#     └── log
polaris_root_dir = /tmp/polaris_batch

# Satellite section
# The title of the section is a "friendly" name.
[lightsail2]
# The "name" argumentis the name of the normalizer; it's the same argument
# that would be passed to "polaris fetch".
name = LightSail-2

# Perhaps you could have two different analyses done for the same satellite
#[lightsail2_new_learner]
# name = LightSail-2
# In the future we can have different learn arguments here as well
# learn_args = -l logistic_regression
# We could overwrite individual paths if we wanted to
# cache_dir = /home/aardvark/polaris/lightsail2_new_learner/cache
# graph_dir = /home/aardvark/polaris/lightsail2_new_learner/graph
# log_dir =  /home/aardvark/polaris/lightsail2_new_learner/log

The polaris batch command itself takes two arguments:

  • --config-file [path to config file]
  • --dry-run to show what would be done; it will print out the various steps and commands, but won't run them.

Shortcomings, accusations and TODOs

  • This is not perfect code, and could do with a refactoring. However, I wanted to get it in front of people for feedback early.

  • Logging actions, and determining the date of last successful fetch, is not yet done.

  • Directory creation code does not (yet) completely work, and does not (yet) respect the --dry-run flag.

Merge request reports