2018-04-11 Outage due to deploy problems
(all times in UTC)
- At 16:38 we got the first Pingdom alert https://gitlab.slack.com/archives/C101F3796/p1523471919000230
There seem to be two issues:
- Invalid cache
- Some of the 500 errors seem to be caused by the problem anticipated in https://gitlab.com/gitlab-org/gitlab-ee/issues/5571, that a
Rails.cache.delete('current_appearance')
will be necessary. This was supposed to be run on the post-deploy step of the deploy, but the deploy didn't reach this step, because
- Some of the 500 errors seem to be caused by the problem anticipated in https://gitlab.com/gitlab-org/gitlab-ee/issues/5571, that a
- Web-11 deploy didn't finish correctly.
- @jameslopez noted that we were getting ENOENT errors https://sentry.gitlap.com/gitlab/gitlabcom/issues/160883/, indicating an issue with unicorn workers not getting restarted (or HUPed) after a gem change, which was a change in 10.7 https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/17924
- @ilyaf noted the following on that node:
Start-Date: 2018-04-11 18:17:56
Commandline: apt-get install -y -q --force-yes gitlab-ee=10.7.0-rc4.ee.0
Requested-By: mayra-cabrera (1113)
Upgrade: gitlab-ee:amd64 (10.6.3-ee.0, 10.7.0-rc4.ee.0)
Error: Sub-process /usr/bin/dpkg returned an error code (1)
End-Date: 2018-04-11 18:19:21
Unpacking gitlab-ee (10.7.0-rc4.ee.0) over (10.6.3-ee.0) ...
dpkg: error processing archive /var/cache/apt/archives/gitlab-ee_10.7.0-rc4.ee.0_amd64.deb (--unpack):
unable to stat './opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/RedCloth-4.3.2/lib/redcloth_scan.so' (which I was about to install): Structure needs cleaning
dpkg-deb: error: subprocess paste was killed by signal (Broken pipe)
[35877.084041] nfs: server 10.70.2.116 not responding, timed out
[35877.136041] nfs: server 10.70.2.116 not responding, timed out
[36763.516694] EXT4-fs error (device sda1): ext4_ext_check_inode:510: inode #1548382: comm ruby: pblk 0 bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
[36836.722287] EXT4-fs error (device sda1): ext4_ext_check_inode:510: inode #1548382: comm dpkg: pblk 0 bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
[36860.034581] EXT4-fs error (device sda1): ext4_ext_check_inode:510: inode #1548382: comm gitlab-ee.posti: pblk 0 bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
/cc @gl-infra
Edited by Alejandro Rodríguez