2019-07-03 & -04 & -05: Repository mirroring delays
Summary
A brief summary of what happened. Try to make it as executive-friendly as possible.
- Service(s) affected : ~"Service:Sidekiq"
- Team attribution : Infra
- Minutes downtime or degradation :
Timeline
2019-07-03
- 09:21 UTC - Number of overdue updates is rising
- 09:30 UTC - We got paged about it
- 09:55 UTC - Runbook steps were executed (Sidekiq queue numbers were fine, pgbouncer didn't seem held up, no significant errors in Sentry)
- 10:40 UTC - Number of overdue updates is starting to go down
- 11:37 UTC - Number of overdue updates is starting to go up again
- 11:40 UTC - We add more Sidekiq workers to help alleviate the problem
- 12:00 UTC - We temporarily increase the capacity of repository mirroring to 2500 (was 960)
- 12:38 UTC - Number of overdue updates is starting to go down
- 13:07 UTC - Number of overdue updates is starting to go up again
- 13:25 UTC - We increase the capacity of repository mirroring to 25000
- 13:30 UTC - Sharp drop in number of overdue updates
- 13:41 UTC - All overdue updates are now scheduled
- 14:07 UTC - Capacity of repository mirroring is set back to 960
- 14:08 UTC - Number of overdue updates is starting to go up
- 14:14 UTC - Adding more Sidekiq workers to keep up with the scheduled jobs
- 14:18 UTC - Capacity of repository mirroring is set 5000
- 14:47 UTC - Number of overdue updates is starting to go down
- 14:51 UTC - Capacity of repository mirroring is set 10000
- 15:16 UTC - Number of overdue updates is starting to go up
- 15:20 UTC - Capacity of repository mirroring is set back to 25000
- 15:31 UTC - Number of overdue updates is starting to go down
- 16:36 UTC - Starting removing the extra Sidekiq
2019-07-04
- 10:53 UTC - Capacity of repository mirroring is set to 2500
- 13:11 UTC - Number of overdue updates is starting to go up
- 13:15 UTC - Capacity of repository mirroring is set to 3800
- 13:25 UTC - Capacity of repository mirroring is set to 4100
- 13:50 UTC - Adding more Sidekiq workers to keep up with the scheduled jobs
- 13:55 UTC - Capacity of repository mirroring is set back to 2500
- 14:00 UTC - Number of overdue updates is starting to go down
- 14:23 UTC - Number of overdue updates is starting to go up
- 14:25 UTC - Capacity of repository mirroring is set to 5000
- 14:40 UTC - Adding more Sidekiq workers to keep up with the scheduled jobs
- 14:56 UTC - Number of overdue updates is starting to go down
2019-07-05
- 08:17 UTC - Number of overdue updates is starting to go up
- 09:17 UTC - Capacity of repository mirroring is set to 25000
- 09:36 UTC - Adding more Sidekiq workers to keep up with the scheduled jobs
- 10:35 UTC - Capacity of repository mirroring is set to 960
- 11:17 UTC - Capacity of repository mirroring is set to 2000
- 11:42 UTC - We manually schedule all the overdue mirrors
- 11:55 UTC - Number of overdue updates is starting to go down
- 12:08 UTC - Number of overdue projects is now 0
- 12:10 UTC - We re-enabled UpdateAllMirrorsWorker
Edited by Ahmad Sherif