Skip to content

Resolve "Logs transitions and errors for BatchedJob"

Diogo Frazão requested to merge 346359-state-machine-gem-batchedjob into master

What does this MR do and why?

In this MR, we are logging every state transition for batched jobs. If a job fails, we also store the exception raised. This information can be helpful during debugging.

I decided to use a state machine for the batched job model to achieve this.

Why are we using state_machines-activerecord gem?

  • We are already using this gem. We know that it is stable
  • Events/states are different concepts
  • Supports Observers
  • Supports transition/state definition
  • Supports specific callbacks for transitions

Specific use for batched jobs:

Logging:

after_transition do |job, transition|
  exception = transition.args.find { |arg| arg[:error].present? }

  Gitlab::ErrorTracking.track_exception(exception[:error], batched_job_id: job.id) if exception

  Gitlab::AppLogger.info(message: 'BatchedJob transition', batched_job_id: job.id, previous_state: transition.from_name, new_state: transition.to_name)
end

We can use the after_transition callback to log every transition + errors.

State definition:

state_machine :status, initial: :active do
  state :pending, value: 0
  state :running, value: 1
  state :failed, value: 2
  state :succeeded, value: 3

  event :succeed do
    transition [:pending, :running, :succeeded, :failed] => :succeeded
  end

  event :failure do
    transition [:running, :failed, :succeeded, :pending] => :failed
  end

  event :run do
    transition [:failed, :pending, :running, :succeeded] => :running
  end
end

We can define specific rules for each transition.

Better way to handle transitions:

Currently:

tracking_record.status = :failed
tracking_record.finished_at = Time.current
tracking_record.save!

with state machine:

tracking_record.failure!(error: error)
state_machine do
  before_transition do |migration, transition|
    migration.finished_at = Time.current if transition.event == :failure
  end
end

Documentation: https://www.rubydoc.info/github/state-machines/state_machines-activerecord/StateMachines/Integrations/ActiveRecord

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #346359 (closed)

Edited by Diogo Frazão

Merge request reports