Gitaly crashes
Gitaly crashed today with a segment violation at 09/17 5:55:46 UTC, following with a looong stack trace of thousands of goroutines, starting with:
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x2 addr=0x4c101b pc=0x4c101b]
goroutine 215065130 [running]:
runtime.throw(0xd4b229, 0x5)
/usr/local/go/src/runtime/panic.go:617 +0x72 fp=0xc020736648 sp=0xc020736618 pc=0x42f362
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:397 +0x401 fp=0xc020736678 sp=0xc020736648 pc=0x4448d1
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc00af1bbc0, 0xc00ea69000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:169 +0x19b fp=0xc0207366d0 sp=0xc020736678 pc=0x4c101b
os.(*File).read(...)
/usr/local/go/src/os/file_unix.go:263
os.(*File).Read(0xc031d2ca78, 0xc00ea69000, 0x1000, 0x1000, 0x44077b, 0xc00bc330e0, 0xc00e6dbf20)
/usr/local/go/src/os/file.go:108 +0x70 fp=0xc020736740 sp=0xc0207366d0 pc=0x4c80f0
gitlab.com/gitlab-org/gitaly/internal/command.(*Command).Read(0xc0001b42d0, 0xc00ea69000, 0x1000, 0x1000, 0x4310bf, 0xdc6be8, 0xc000044f00)
/var/cache/omnibus/src/gitaly/internal/command/command.go:101 +0x5a fp=0xc020736788 sp=0xc020736740 pc=0xa1929a
bufio.(*Reader).fill(0xc00583c240)
/usr/local/go/src/bufio/bufio.go:100 +0x10f fp=0xc0207367d8 sp=0xc020736788 pc=0x554ccf
bufio.(*Reader).ReadSlice(0xc00583c240, 0xc01742fd0a, 0xc000ecd638, 0xc020736880, 0x42e981, 0xdc6d68, 0xc020736890)
/usr/local/go/src/bufio/bufio.go:356 +0x3d fp=0xc020736820 sp=0xc0207367d8 pc=0x555a1d
bufio.(*Reader).ReadBytes(0xc00583c240, 0xc02073680a, 0xc020736950, 0x474aba, 0x158f5e0, 0xc016ccf180, 0x0)
/usr/local/go/src/bufio/bufio.go:434 +0x70 fp=0xc0207368e0 sp=0xc020736820 pc=0x555ec0
bufio.(*Reader).ReadString(...)
/usr/local/go/src/bufio/bufio.go:474
gitlab.com/gitlab-org/gitaly/internal/git/catfile.ParseObjectInfo(0xc00583c240, 0xc031d2ca70, 0xc020736a98, 0x1)
/var/cache/omnibus/src/gitaly/internal/git/catfile/objectinfo.go:33 +0x49 fp=0xc0207369f0 sp=0xc0207368e0 pc=0xab6b09
gitlab.com/gitlab-org/gitaly/internal/git/catfile.(*batchProcess).reader(0xc00b4482d0, 0xc00ad93440, 0x28, 0xd4bf7d, 0x6, 0x0, 0x0, 0x0, 0x0)
/var/cache/omnibus/src/gitaly/internal/git/catfile/batch.go:85 +0x26c fp=0xc020736ae8 sp=0xc0207369f0 pc=0xab3a8c
gitlab.com/gitlab-org/gitaly/internal/git/catfile.(*Batch).Commit(0xc00b4484e0, 0xc00ad93440, 0x28, 0xc018bff3b0, 0x0, 0x0, 0x31)
/var/cache/omnibus/src/gitaly/internal/git/catfile/catfile.go:95 +0xce fp=0xc020736b40 sp=0xc020736ae8 pc=0xab5dee
gitlab.com/gitlab-org/gitaly/internal/git/log.GetCommitCatfile(0xc00b4484e0, 0xc019d0a090, 0x28, 0x28, 0xc019d0a090, 0x28)
/var/cache/omnibus/src/gitaly/internal/git/log/commit.go:37 +0xb5 fp=0xc020736b90 sp=0xc020736b40 pc=0xab8965
gitlab.com/gitlab-org/gitaly/internal/service/ref.newFindLocalBranchesWriter.func1(0xc0258c6900, 0x14, 0x20, 0xc0208d7301, 0xc0208d7380)
/var/cache/omnibus/src/gitaly/internal/service/ref/util.go:88 +0x14f fp=0xc020736c80 sp=0xc020736b90 pc=0xac670f
That was followed by a loop of hundreds of gitaly restarts by the supervisor between 05:55:48 and 06:03:04 because Gitaly always immedeately crashed again with another SIGSEGV:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xa9ee4f]
goroutine 151 [running]:
gitlab.com/gitlab-org/gitaly/internal/helper/housekeeping.fixDirectoryPermissions.func1(0xc003d1e230, 0x44, 0x0, 0x0, 0xe971c0, 0xc0015661b0, 0x10, 0xc52cc0)
/var/cache/omnibus/src/gitaly/internal/helper/housekeeping/housekeeping.go:70 +0x2f
path/filepath.Walk(0xc003d1e230, 0x44, 0xc0065bc020, 0xd0f880, 0xc001566180)
/usr/local/go/src/path/filepath/path.go:402 +0x6a
gitlab.com/gitlab-org/gitaly/internal/helper/housekeeping.fixDirectoryPermissions(0xc003d1e230, 0x44, 0xc001566180, 0xc003d1e230, 0x44)
/var/cache/omnibus/src/gitaly/internal/helper/housekeeping/housekeeping.go:69 +0x6f
gitlab.com/gitlab-org/gitaly/internal/helper/housekeeping.FixDirectoryPermissions(...)
/var/cache/omnibus/src/gitaly/internal/helper/housekeeping/housekeeping.go:63
gitlab.com/gitlab-org/gitaly/internal/tempdir.clean(0xc0001fe080, 0x31, 0x2, 0xc0001fe080)
/var/cache/omnibus/src/gitaly/internal/tempdir/tempdir.go:142 +0x278
gitlab.com/gitlab-org/gitaly/internal/tempdir.StartCleaning.func1(0xc0005321e2, 0x7, 0xc0005321f3, 0x25)
/var/cache/omnibus/src/gitaly/internal/tempdir/tempdir.go:102 +0xdf
created by gitlab.com/gitlab-org/gitaly/internal/tempdir.StartCleaning
/var/cache/omnibus/src/gitaly/internal/tempdir/tempdir.go:100 +0x91
{"gitaly":2886,"level":"warning","msg":"forwarding signal","signal":17,"time":"2019-09-17T05:55:49Z","wrapper":2879}
{"error":"os: process already finished","gitaly":2886,"level":"error","msg":"can't forward the signal","signal":17,"time":"2019-09-17T05:55:49Z","wrapper":2879}
{"gitaly":2886,"level":"error","msg":"wrapper for gitaly shutting down","time":"2019-09-17T05:55:49Z","wrapper":2879}
One thing that stood out on file-33 is that /tmp
had wrong permissions (gitlab-com/gl-infra/production#1159 (closed)), but i'm not sure yet if that is related to the Gitaly crashes.