Implementation of a pre/post hook system to allow more customization
This issue discusses the implementation of a hooks system, which would allow to run custom actions before and after the OLIP deployment.
Rationale
The OLIP project is intended to be used by many entities, each of those having specific needs. We don't all run the same hardware, software ; we have different workflows, deployment requirements ; etc.
pre-install step
At the very least, we can assume that everyone has to install an OS at first (it wouldn't work very well otherwise). Then you might want to install your SSH keys, for instance.
We guess that anyone probably has similar requirements.
As an example, BSF runs a specific playbook at install time, to configure our remote access, logs aggregation, amongst other things.
Our integration tooling runs this playbook, then runs the olip-deploy
one ; both playbooks share a common set of arguments, built by our integration tooling.
post-install step
You might have the need to perform custom actions after OLIP was installed. For instance, pre-populating OLIP with a bunch of apps, install some contents ; you may want to register your device to your deployed devices database ; etc.
The other need comes from the fact that we all are humans beings at the beginning (as far as I can guess). What humans all have in common is, humans make mistakes. It is very likely that someday you will have something to fix into your pre-install step, which requires some clean up - we did, more often than we'd like to admit. :') ^^'
The problem here, is that either you have to add those hotfixes right into the olip-deploy
playbook (which you should not do, since OLIP is intended to be as generic as possible and not include your specific needs), or you have to run a post-install script of some sort - again.
the deployment args point
The OLIP go.sh
install script requires a bunch of arguments. Chances are that a part (if not all of it) of those arguments are common amongst the pre/post-install steps. Having to maintain the parsing/propagation into/across three separated toolings is not practical.
The proposal
OLIP may include a hooks system.
A --pre-hook
argument would allow one to define a custom command to be ran before the actual olip-deploy
playbook is ran.
- That may be a command such as
ansible-pull ... https://github.../olip-pre-install
A --post-hook
argument would allow one to define a custom command to be ran after the actual olip-deploy
playbook has completed.
- That may be a script that hits some API to register the device as actually installed.
- A webhook can be called here, to inform the Logistic team that the device is « ready to be shipped ».
- In a CI context, that may be a
testinfra
based script to perform some QA control to ensure the device was actually configured as intended.
Persistence of the hook arguments
Tools such as certbot
create directories where on can drop scripts that will be executed. The directory approach is convenient, as it allows one to add more scripts in it.
The arguments may be written there, prepended by an appropriate #!/bin/bash
shebang, in order to result in a complete, POSIX compliant script that also can be ran from the command line by an adventurous administrator.
The generated script may be named like hooks/pre/000-go-sh-arg.sh
to make clear it is automatically generated - and ran first.
It would be up to this specific script to ensure wether it should be ran everytime, or just once. Example:
# Passing this command in a --pre-hook argument from the shell introduces some
# quoting nightmare - this is intended, as the illustration of the quirks we
# might have to deal with
[ -x "hooks/pre/000-go-sh-arg.sh" ] && exit ; echo "I'm running for the first and only time"
This way, we can imagine a pre-install.sh
script (which would abort if a lockfile is found, preventing it to run several times) along some rolling-releass-and-hotfixes.sh
script (which would be intented to be ran everytime).
Example workflow
At install time: --post-hook "ansible-pull http://github.com/.../olip-post-hook main.yml"
- This playbook has a bunch of
roles/hotfixes/20210106-fix-logrotate.sh
-like scripts that thehotfixes
role places in the appropriatehooks/post/
directory. - this particular script may be able to delete itself after successfull completion, leaving some
/var/lib/olip/hotfixes/20210106-fix-logrotate.done
lockfile. - Next run of the playbook will check for the existence of such a lockfile, preventing the script to be copied again to the post-hook directory.
- all of this is pseudo-code thinking, the egg'n'chicken problem is left as an exercise for further discussion.
At install time: --post-hook "git clone http://github.com/.../olip-arbitrary-script /some/path && /some/path/script.sh"
- The git-cloned
/some/path/script.sh
script take cares of running whatever you need, and thehooks/pre/000-go-sh-arg.sh
script ensure this git clone is always uptodate - similarily, the
/some/path/script.sh
could completely replace (or delete) the script generated from the initial--pre-hook
argument, which is a nice way to provide a run-once-onlypre-hook
argument while still integrating a completely different pre-hook intended to be ran everytime🕶 .
Benefits
- The OLIP installation can be customized right from the
go.sh
command-line, so we don't need to forkolip-deploy
to add customization. - The
pre
/post
design allows to add any custom actions, no matter wether it needs OLIP to be installed first as a requirement. - The
post-hooks
allows easy hotfixes.
A non exhaustive list of questions/pitfalls/etc
- Appropriate path for the
pre
/post
hooks directories - FHS compliant as possible - Appropriate path for the eventual lockfiles, as illustated above - FHS compliant as possible
- A proper naming of all of those paths / scripts / command-line flags
- Should we somehow validate these inputs? Such as, run
shellcheck
against it - in order to ensure we have valid scripts at the end. - Should the
post
hook run if theolip-deploy
playbook failed? It may allow to act as a fallback in such a case (for instance, you messed up with your SSH/VPN tasks, locking yourself out of the instance - or you may just want to be warned in real time about the failure).
Do not hesitate to comment any point bellow, (ab)using of the threads feature!