Implementation of a pre/post hook system to allow more customization

This issue discusses the implementation of a hooks system, which would allow to run custom actions before and after the OLIP deployment.

Rationale

The OLIP project is intended to be used by many entities, each of those having specific needs. We don't all run the same hardware, software ; we have different workflows, deployment requirements ; etc.

pre-install step

At the very least, we can assume that everyone has to install an OS at first (it wouldn't work very well otherwise). Then you might want to install your SSH keys, for instance.

We guess that anyone probably has similar requirements.

As an example, BSF runs a specific playbook at install time, to configure our remote access, logs aggregation, amongst other things. Our integration tooling runs this playbook, then runs the olip-deploy one ; both playbooks share a common set of arguments, built by our integration tooling.

post-install step

You might have the need to perform custom actions after OLIP was installed. For instance, pre-populating OLIP with a bunch of apps, install some contents ; you may want to register your device to your deployed devices database ; etc.

The other need comes from the fact that we all are humans beings at the beginning (as far as I can guess). What humans all have in common is, humans make mistakes. It is very likely that someday you will have something to fix into your pre-install step, which requires some clean up - we did, more often than we'd like to admit. :') ^^'

The problem here, is that either you have to add those hotfixes right into the olip-deploy playbook (which you should not do, since OLIP is intended to be as generic as possible and not include your specific needs), or you have to run a post-install script of some sort - again.

the deployment args point

The OLIP go.sh install script requires a bunch of arguments. Chances are that a part (if not all of it) of those arguments are common amongst the pre/post-install steps. Having to maintain the parsing/propagation into/across three separated toolings is not practical.

The proposal

OLIP may include a hooks system.

A --pre-hook argument would allow one to define a custom command to be ran before the actual olip-deploy playbook is ran.

That may be a command such as ansible-pull ... https://github.../olip-pre-install

A --post-hook argument would allow one to define a custom command to be ran after the actual olip-deploy playbook has completed.

That may be a script that hits some API to register the device as actually installed.
A webhook can be called here, to inform the Logistic team that the device is « ready to be shipped ».
In a CI context, that may be a testinfra based script to perform some QA control to ensure the device was actually configured as intended.

Persistence of the hook arguments

Tools such as certbot create directories where on can drop scripts that will be executed. The directory approach is convenient, as it allows one to add more scripts in it.

The arguments may be written there, prepended by an appropriate #!/bin/bash shebang, in order to result in a complete, POSIX compliant script that also can be ran from the command line by an adventurous administrator.

The generated script may be named like hooks/pre/000-go-sh-arg.sh to make clear it is automatically generated - and ran first.

It would be up to this specific script to ensure wether it should be ran everytime, or just once. Example:

# Passing this command in a --pre-hook argument from the shell introduces some
# quoting nightmare - this is intended, as the illustration of the quirks we
# might have to deal with
[ -x "hooks/pre/000-go-sh-arg.sh" ] && exit ; echo "I'm running for the first and only time"

This way, we can imagine a pre-install.sh script (which would abort if a lockfile is found, preventing it to run several times) along some rolling-releass-and-hotfixes.sh script (which would be intented to be ran everytime).

Example workflow

At install time: --post-hook "ansible-pull http://github.com/.../olip-post-hook main.yml"

This playbook has a bunch of roles/hotfixes/20210106-fix-logrotate.sh-like scripts that the hotfixes role places in the appropriate hooks/post/ directory.
this particular script may be able to delete itself after successfull completion, leaving some /var/lib/olip/hotfixes/20210106-fix-logrotate.done lockfile.
Next run of the playbook will check for the existence of such a lockfile, preventing the script to be copied again to the post-hook directory.
all of this is pseudo-code thinking, the egg'n'chicken problem is left as an exercise for further discussion.

At install time: --post-hook "git clone http://github.com/.../olip-arbitrary-script /some/path && /some/path/script.sh"

The git-cloned /some/path/script.sh script take cares of running whatever you need, and the hooks/pre/000-go-sh-arg.sh script ensure this git clone is always uptodate
similarily, the /some/path/script.sh could completely replace (or delete) the script generated from the initial --pre-hook argument, which is a nice way to provide a run-once-only pre-hook argument while still integrating a completely different pre-hook intended to be ran everytime 🕶 .

Benefits

The OLIP installation can be customized right from the go.sh command-line, so we don't need to fork olip-deploy to add customization.
The pre/post design allows to add any custom actions, no matter wether it needs OLIP to be installed first as a requirement.
The post-hooks allows easy hotfixes.

A non exhaustive list of questions/pitfalls/etc

Appropriate path for the pre/post hooks directories - FHS compliant as possible
Appropriate path for the eventual lockfiles, as illustated above - FHS compliant as possible
A proper naming of all of those paths / scripts / command-line flags
Should we somehow validate these inputs? Such as, run shellcheck against it - in order to ensure we have valid scripts at the end.
Should the post hook run if the olip-deploy playbook failed? It may allow to act as a fallback in such a case (for instance, you messed up with your SSH/VPN tasks, locking yourself out of the instance - or you may just want to be warned in real time about the failure).

Do not hesitate to comment any point bellow, (ab)using of the threads feature!