corruption of Sidekiq cron job enqueue_time prevents Sidekiq from initializing - 'invalid date' logged during crash loop

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Close this issue

Summary

This issue summarises the symptoms and 'fix' for a customer emergency.

Customer reported severe impact to GitLab functionality, and very large numbers of Sidekiq jobs queued
On checking Sidekiq (Admin area > Monitoring > Background jobs) Busy showed as zero
Clicking through to this view, there were no sidekiq processes or jobs listed (Sidekiq is down)
Sidekiq was found to be in a crash loop on their Sidekiq virtual machines: the service in gitlab-ctl status showing an uptime of 2-3 minutes or less. (Roughly the time it takes for Sidekiq to initialize the Rails stack)

Sidekiq logs showed usual initialisation messages, such as:

{"severity":"INFO","time":"YYYY-MM-DDTHH:MM:SS.000Z","message":"Sidekiq x.y.z connecting to Redis with ptions ...

followed by:

invalid date

The top of the stack trace:

/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:631:in `parse'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:631:in `rescue in parse_enqueue_time'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:628:in `parse_enqueue_time'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:290:in `initialize'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:228:in `new'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:228:in `block in all'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:226:in `collect'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:226:in `all'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:534:in `destroy_removed_jobs'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:177:in `load_from_hash!'
/opt/gitlab/embedded/service/gitlab-rails/config/initializers/sidekiq.rb:9:in `load_cron_jobs!'
/opt/gitlab/embedded/service/gitlab-rails/config/initializers/sidekiq.rb:113:in `block in <top (required)>'
/opt/gitlab/embedded/service/gitlab-rails/vendor/gems/sidekiq-7.1.6/lib/sidekiq.rb:98:in `configure_server'
/opt/gitlab/embedded/service/gitlab-rails/config/initializers/sidekiq.rb:41:in `<top (required)>'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.1/lib/rails/engine.rb:667:in `load'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.1/lib/rails/engine.rb:667:in `block in load_config_initializer'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/activesupport-7.0.8.1/lib/active_support/notifications.rb:208:in `instrument'

Observations
- This is occurring during the initializers, that is: when Sidekiq is booting.
- This can also be reproduced on the Rails console, for example, all three of these lines generate the same error.
```
current_job_names = Sidekiq::Cron::Job.all.filter_map { |j| j.name if j.source == "schedule" }
current_job_names = Sidekiq::Cron::Job.all.map(&:name)
Sidekiq::Cron::Job.destroy_all! 
```

Conclusions from looking at this version of the upstream code (job.rb)

On initialisation, Sidekiq refreshes the cron jobs via the job names.

To populate current_job_names Sidekiq::Cron::Job.all performs Sidekiq::Cron::Job.new on what's been found in Redis

532       # Remove "removed jobs" between current jobs and new jobs
533       def self.destroy_removed_jobs new_job_names
534         current_job_names = Sidekiq::Cron::Job.all.filter_map { |j| j.name if j.source == "schedule" }
535         removed_job_names = current_job_names - new_job_names
536         removed_job_names.each { |j| Sidekiq::Cron::Job.destroy(j) }
537         removed_job_names
538       end

this results in initialize being called, and this attempts to determine last_enqueue_time

288        # Set last enqueue time - from args or from existing job.
289        if args['last_enqueue_time'] && !args['last_enqueue_time'].empty?
290          @last_enqueue_time = parse_enqueue_time(args['last_enqueue_time'])
291        else
292          @last_enqueue_time = last_enqueue_time_from_redis
293        end

We don't need to know last_enqueue_time to perform (for example) destroy_all! - those records are all going to get destroyed anyway, but I guess it's part of creating objects for all the cronjobs.

Steps to reproduce

Unknown.

This might be random corruption of data written to Redis, or it might be a locale issue, though in that case it'd more likely to be reproducible and seen more widely. For example, there's an upstream PR related to Japanese formatting in datestamps: https://github.com/sidekiq-cron/sidekiq-cron/pull/354

Workaround

Stop Sidekiq and Puma on one node - gitlab-ctl start sidekiq ; gitlab-ctl start puma

Back up job.rb

   # version specific directories replaced with * - this might need manual modification

cd /opt/gitlab/embedded/lib/ruby/gems/*/gems/sidekiq-cron-*/lib/sidekiq/cron
cp -a job.rb job.rb_bkp

Modify def initialize so @last_enqueue_time is NOT set via parse_enqueue_time

before

     # Set last enqueue time - from args or from existing job.
     if args['last_enqueue_time'] && !args['last_enqueue_time'].empty?
       @last_enqueue_time = parse_enqueue_time(args['last_enqueue_time'])
     else
       @last_enqueue_time = last_enqueue_time_from_redis
     end

after

     # Set last enqueue time - from args or from existing job.
     #if args['last_enqueue_time'] && !args['last_enqueue_time'].empty?
     #  @last_enqueue_time = parse_enqueue_time(args['last_enqueue_time'])
     #else
       @last_enqueue_time = last_enqueue_time_from_redis
     #end

Start the Rails console on this node: gitlab-rails console
Remove all cron jobs (via this modified code)
```
Sidekiq::Cron::Job.destroy_all!
```
Quit the rails console
Revert the code change
```
mv job.rb_bkp job.rb
```
Start sidekiq: gitlab-ctl start sidekiq; gitlab-ctl start puma

If there are other nodes running Sidekiq, you should find that they will have already initialized successfully and are now processing jobs.

Example Project

What is the current bug behavior?

If Sidekiq::Cron::Job.all fails (in this case for parse_enqueue_time but it could be a problem with other attributes) Sidekiq gets into state where it can neither start, nor can the cron jobs be removed as the code to do so depends on Sidekiq::Cron::Job.all

What is the expected correct behavior?

IF we find this is a broader problem we might need a rake task that can go direct to Redis and deal with the cron jobs. But for the moment, %Awaiting further demand

Relevant logs and/or screenshots

Output of checks

Results of GitLab environment info

This occurred on 16.11

Expand for output related to GitLab environment info


(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)

(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true)
(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)
(we will only investigate if the tests are passing)

Possible fixes

Edited Aug 21, 2025 by 🤖 GitLab Bot 🤖