Skip to content

Hooks for jobs

Current situation

Workspaces are created on-the-fly on execution environments. There is no way to keep them so that we can investigate the content of the workspace. And there is no way to prevent the dynamic creation of workspaces, but there are cases where it would be useful to reuse a pre-populated one.

Expected outcome

A way to control the workspace lifecycle.

Analysis

Keeping an execution environment is one need related to workspace lifecycle. There is also a need to initialize the workspace, as well as a need to perform additional cleanup operations before job completion.

When writing a workflow, it is already possible to insert steps at the beginning or at the end, but this is only possible for jobs explicitly written in the workflow, not for generated jobs.

Also, it could become tedious if the same steps have to be added for many or all jobs, and there is no way for an orchestrator administrator to insert specific steps in this context.

Hooks provide a convenient way to fulfill those needs. Another way would be to introduce some sort of job 'classes', that would have their own specific behaviors, but that would add a new mechanism, which would add complexity.

Solution

Events that can have attached hooks events are extended to accept more use cases: there are now two channels events: setup and teardown.

A hook cannot mix channel-specific events with provider-specific events (the category* events). A channel hook must specify at least one channel-specific event.

The before section related to channel: setup events, if present, contains a non-empty list of steps. Those steps are either regular steps or a use-workspace step:

  - use-workspace: /path/to/workspace

use-workspace forces the reuse of the specified workspace. Its content is not cleaned on job start and is kept as-is on job teardown.

The other steps in the before section run in a context where the workspace does not yet exist.

The steps in the after section related to channel: setup events, if present, run in a context where the workspace exist (and is the current directory, as usual).

The before section related to channel: teardown events, if present, contains a non-empty list of steps. Those steps are either regular steps or a keep-workspace step:

  - keep-workspace: true

keep-workspace can only be used for non-reused workspaces (as reused workspaces are by nature kept).

It prevents the workspace cleanup that occurs after a job has completed. It is false by default

Please note that keeping non-reused workspaces can eat your disk space quickly.

The steps in the after section related to channel: teardown events, if present, run in a context where the workspace does not exist anymore (except if keep-worksace has been defined).

Conditionals can be used on job hooks, to restrict their scopes. By default, job hooks apply to all jobs.

The setup and teardown blocs are nested if more than one job hook applies. If there is at least one keep-workspace: true statement in the hooks chain, the workspace is kept. The inner-most use-workspace statement is used.

before_steps from workflow-defined channel setup hooks
  before_steps from channel-defined setup hook 1
    before_steps from channel-defined setup hook 2
      workspace creation (or definition, if use-workspace is set)
    after_steps from channel-defined setup hook 2
  after_steps from provider-defined setup hook 1
after_steps from workflow-defined channel setup hooks
(the job steps)
before_steps from workflow-defined channel teardown hooks
  before_steps from channel-defined teardown hook 1
    before_steps from channel-defined teardown hook 2
      workspace deletion (if use-workspace was not set and keep-workspace is not set)
    after_steps from channel-defined teardown hook 2
  after_steps from provider-defined teardown hook 1
after_steps from workflow-defined channel teardown hooks

Assuming the following hooks definition (either in the workflow or at orchestrator level):

hooks:
- name: before-everything
  events:
  - channel: setup
  before:
  - run: echo "first statement, the workspace does not exist yet"

- name: after-everything
  events:
  - channel: teardown
  after:
  - run: echo "last statement"

- name: reused-workspace
  events:
  - channel: setup
  if: job.name == 'publish'
  before:
  - use-workspace: /var/otf/workspace/publications

- name: keep-workspace-on-error
  events:
  - channel: teardown
  if: failure()
  before:
  - keep-workspace: true
  - run: echo "the job has failed, it has been kept for further analysis."
  - run: du -sh .

A job named publish will execute the following statements in a fixed /var/otf/workspace/publications workspace:

echo 'first statement, the workspace does not exist yet'
...
echo 'last statement'

A job named common will execute the following statements in a newly-provisioned workspace. If the job failed, the workspace is kept:

echo 'first statement, the workspace does not exist yet'
...
[echo 'the job has failed ...' if the job failed.]
echo 'last statement'
Edited by Martin Lafaix