Hooks for jobs
Current situation
Workspaces are created on-the-fly on execution environments. There is no way to keep them so that we can investigate the content of the workspace. And there is no way to prevent the dynamic creation of workspaces, but there are cases where it would be useful to reuse a pre-populated one.
Expected outcome
A way to control the workspace lifecycle.
Analysis
Keeping an execution environment is one need related to workspace lifecycle. There is also a need to initialize the workspace, as well as a need to perform additional cleanup operations before job completion.
When writing a workflow, it is already possible to insert steps at the beginning or at the end, but this is only possible for jobs explicitly written in the workflow, not for generated jobs.
Also, it could become tedious if the same steps have to be added for many or all jobs, and there is no way for an orchestrator administrator to insert specific steps in this context.
Hooks provide a convenient way to fulfill those needs. Another way would be to introduce some sort of job 'classes', that would have their own specific behaviors, but that would add a new mechanism, which would add complexity.
Solution
Events that can have attached hooks events are extended to accept more use cases: there are now two
channels
events: setup
and teardown
.
A hook cannot mix channel-specific events with provider-specific events (the category*
events). A
channel hook must specify at least one channel-specific event.
The before
section related to channel: setup
events, if present, contains a non-empty list of
steps. Those steps are either regular steps or a use-workspace
step:
- use-workspace: /path/to/workspace
use-workspace
forces the reuse of the specified workspace. Its content is not cleaned on job start
and is kept as-is on job teardown.
The other steps in the before
section run in a context where the workspace does not yet exist.
The steps in the after
section related to channel: setup
events, if present, run in a context where the workspace exist (and is the current directory, as usual).
The before
section related to channel: teardown
events, if present, contains a non-empty list of
steps. Those steps are either regular steps or a keep-workspace
step:
- keep-workspace: true
keep-workspace
can only be used for non-reused workspaces (as reused workspaces are by nature kept).
It prevents the workspace cleanup that occurs after a job has completed. It is false by default
Please note that keeping non-reused workspaces can eat your disk space quickly.
The steps in the after
section related to channel: teardown
events, if present, run in a context where the workspace does not exist anymore (except if keep-worksace
has been defined).
Conditionals can be used on job hooks, to restrict their scopes. By default, job hooks apply to all jobs.
The setup
and teardown
blocs are nested if more than one job hook applies. If there is at least
one keep-workspace: true
statement in the hooks chain, the workspace is kept. The inner-most
use-workspace
statement is used.
before_steps from workflow-defined channel setup hooks
before_steps from channel-defined setup hook 1
before_steps from channel-defined setup hook 2
workspace creation (or definition, if use-workspace is set)
after_steps from channel-defined setup hook 2
after_steps from provider-defined setup hook 1
after_steps from workflow-defined channel setup hooks
(the job steps)
before_steps from workflow-defined channel teardown hooks
before_steps from channel-defined teardown hook 1
before_steps from channel-defined teardown hook 2
workspace deletion (if use-workspace was not set and keep-workspace is not set)
after_steps from channel-defined teardown hook 2
after_steps from provider-defined teardown hook 1
after_steps from workflow-defined channel teardown hooks
Assuming the following hooks definition (either in the workflow or at orchestrator level):
hooks:
- name: before-everything
events:
- channel: setup
before:
- run: echo "first statement, the workspace does not exist yet"
- name: after-everything
events:
- channel: teardown
after:
- run: echo "last statement"
- name: reused-workspace
events:
- channel: setup
if: job.name == 'publish'
before:
- use-workspace: /var/otf/workspace/publications
- name: keep-workspace-on-error
events:
- channel: teardown
if: failure()
before:
- keep-workspace: true
- run: echo "the job has failed, it has been kept for further analysis."
- run: du -sh .
A job named publish
will execute the following statements in a fixed /var/otf/workspace/publications
workspace:
echo 'first statement, the workspace does not exist yet'
...
echo 'last statement'
A job named common
will execute the following statements in a newly-provisioned workspace. If the job failed, the
workspace is kept:
echo 'first statement, the workspace does not exist yet'
...
[echo 'the job has failed ...' if the job failed.]
echo 'last statement'