Skip to content

ClearDatabaseCacheWorker keeps retrying and causes high amount of table bloat

We ran ClearDatabaseCacheWorker manually via a Rake task, and it caused a high amount of table bloat and replication lag because it failed and retried. For more details, see: https://gitlab.com/gitlab-com/infrastructure/issues/1576#note_27127622.

It appears that the Sidekiq job continued to retry multiple times over the course of 24 hours:

image

Several problems here:

  1. It should not retry so much
  2. We should limit the amount of table bloat this causes in production
  3. It should resume from where it left off rather than reclear the same database rows

Any other ideas, @pcarranza?

/cc: @nick.thomas