2019-07-03 & -04 & -05: Repository mirroring delays

Summary

A brief summary of what happened. Try to make it as executive-friendly as possible.

  • Service(s) affected : ~"Service:Sidekiq"
  • Team attribution : Infra
  • Minutes downtime or degradation :

Timeline

2019-07-03

  • 09:21 UTC - Number of overdue updates is rising
  • 09:30 UTC - We got paged about it
  • 09:55 UTC - Runbook steps were executed (Sidekiq queue numbers were fine, pgbouncer didn't seem held up, no significant errors in Sentry)
  • 10:40 UTC - Number of overdue updates is starting to go down
  • 11:37 UTC - Number of overdue updates is starting to go up again
  • 11:40 UTC - We add more Sidekiq workers to help alleviate the problem
  • 12:00 UTC - We temporarily increase the capacity of repository mirroring to 2500 (was 960)
  • 12:38 UTC - Number of overdue updates is starting to go down
  • 13:07 UTC - Number of overdue updates is starting to go up again
  • 13:25 UTC - We increase the capacity of repository mirroring to 25000
  • 13:30 UTC - Sharp drop in number of overdue updates
  • 13:41 UTC - All overdue updates are now scheduled
  • 14:07 UTC - Capacity of repository mirroring is set back to 960
  • 14:08 UTC - Number of overdue updates is starting to go up
  • 14:14 UTC - Adding more Sidekiq workers to keep up with the scheduled jobs
  • 14:18 UTC - Capacity of repository mirroring is set 5000
  • 14:47 UTC - Number of overdue updates is starting to go down
  • 14:51 UTC - Capacity of repository mirroring is set 10000
  • 15:16 UTC - Number of overdue updates is starting to go up
  • 15:20 UTC - Capacity of repository mirroring is set back to 25000
  • 15:31 UTC - Number of overdue updates is starting to go down
  • 16:36 UTC - Starting removing the extra Sidekiq

2019-07-04

  • 10:53 UTC - Capacity of repository mirroring is set to 2500
  • 13:11 UTC - Number of overdue updates is starting to go up
  • 13:15 UTC - Capacity of repository mirroring is set to 3800
  • 13:25 UTC - Capacity of repository mirroring is set to 4100
  • 13:50 UTC - Adding more Sidekiq workers to keep up with the scheduled jobs
  • 13:55 UTC - Capacity of repository mirroring is set back to 2500
  • 14:00 UTC - Number of overdue updates is starting to go down
  • 14:23 UTC - Number of overdue updates is starting to go up
  • 14:25 UTC - Capacity of repository mirroring is set to 5000
  • 14:40 UTC - Adding more Sidekiq workers to keep up with the scheduled jobs
  • 14:56 UTC - Number of overdue updates is starting to go down

2019-07-05

  • 08:17 UTC - Number of overdue updates is starting to go up
  • 09:17 UTC - Capacity of repository mirroring is set to 25000
  • 09:36 UTC - Adding more Sidekiq workers to keep up with the scheduled jobs
  • 10:35 UTC - Capacity of repository mirroring is set to 960
  • 11:17 UTC - Capacity of repository mirroring is set to 2000
  • 11:42 UTC - We manually schedule all the overdue mirrors
  • 11:55 UTC - Number of overdue updates is starting to go down
  • 12:08 UTC - Number of overdue projects is now 0
  • 12:10 UTC - We re-enabled UpdateAllMirrorsWorker
Edited Jul 05, 2019 by Ahmad Sherif
Assignee Loading
Time tracking Loading