No logs for Gitaly since 11.5.0-rc4-ee

We stopped seeing Gitaly logs for GRPC calls in Kibana right around the time of the deploy of 11.5.0-rc4-ee (see for example https://log.gitlab.net/goto/e4043f690834136b2adc21577b4ee32e), and @ahanselka noticed a lot of errors like these on pubsub:

 WARN    elasticsearch/client.go:502    Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xbef2947004242581, ext:112072138997, loc:(*time.Location)(0x242f380)}, Meta:common.MapStr(nil), Fields:common.MapStr{"message":"{\"grpc.code\":\"OK\",\"grpc.meta.auth_version\":\"v2\",\"grpc.meta.client_name\":\"gitlab-web\",\"grpc.method\":\"FindCommits\",\"grpc.request.deadline\":\"2018-11-12T19:54:02Z\",\"grpc.request.fullMethod\":\"/gitaly.CommitService/FindCommits\",\"grpc.request.glRepository\":\"project-13083\",\"grpc.request.repoPath\":\"@hashed/94/d1/94d17a47b559db4e59e13ca10c0f1037ac49f3050e53adbf59ebc2feb528c344.git\",\"grpc.request.repoStorage\":\"nfs-file08\",\"grpc.request.topLevelGroup\":\"@hashed\",\"grpc.service\":\"gitaly.CommitService\",\"grpc.start_time\":\"2018-11-12T19:53:32Z\",\"grpc.time_ms\":146.211,\"level\":\"info\",\"msg\":\"finished streaming call with code OK\",\"peer.address\":\"10.220.8.20:54790\",\"span.kind\":\"server\",\"system\":\"grpc\",\"time\":\"2018-11-12T19:53:32Z\",\"tag\":\"gitaly\",\"environment\":\"gprd\",\"hostname\":\"file-08-stor-gprd\",\"fqdn\":\"file-08-stor-gprd.c.gitlab-production.internal\"}", "json":common.MapStr{"hostname":"file-08-stor-gprd", "grpc.request.repoPath":"@hashed/94/d1/94d17a47b559db4e59e13ca10c0f1037ac49f3050e53adbf59ebc2feb528c344.git", "grpc.start_time":"2018-11-12T19:53:32Z", "grpc.method":"FindCommits", "grpc.request.fullMethod":"/gitaly.CommitService/FindCommits", "grpc.request.topLevelGroup":"@hashed", "level":"info", "span.kind":"server", "grpc.meta.auth_version":"v2", "grpc.request.deadline":"2018-11-12T19:54:02Z", "grpc.request.glRepository":"project-13083", "grpc.meta.client_name":"gitlab-web", "system":"grpc", "fqdn":"file-08-stor-gprd.c.gitlab-production.internal", "grpc.request.repoStorage":"nfs-file08", "peer.address":"10.220.8.20:54790", "tag":"gitaly", "msg":"finished streaming call with code OK", "grpc.code":"OK", "environment":"gprd", "time":"2018-11-12T19:53:32Z", "grpc.service":"gitaly.CommitService", "grpc.time_ms":146.211}, "beat":common.MapStr{"name":"pubsub-gitaly-inf-gprd", "hostname":"pubsub-gitaly-inf-gprd", "version":"6.2.2"}, "type":"pubsub-gitaly-inf-gprd", "message_id":"284653152909567", "publish_time":common.Time{wall:0x2349340, ext:63677649216, loc:(*time.Location)(nil)}}, Private:interface {}(nil)}, Flags:0x0} (status=400): {"type":"mapper_parsing_exception","reason":"object mapping for [json.peer.address] tried to parse field [peer.address] as object, but found a concrete value"}

During investigation we found that 11.5.0-rc4-ee bumps Gitaly version from 0.126.0 to 0.129.0. Among those changes is gitlab-org/gitaly!954 (merged) which, specifically on this line, changes the format of field peer.address from an object to a string. In a somewhat related forum post an Elastic team member points out that an index will always expect the same value type for each field until the index is deleted. You can double-check the current mapping of field json.peer.address on Elastic's API console querying for route [GET] /pubsub-gitaly-inf-ops-2018.11.10/_mapping.

Since this is leaving us with reduced visibility on Gitaly (we do have gitaly-ruby logs, those didn't seem to change format, but not gRPC invocation logs for the main Gitaly server) we could:

  • Rollback to 11.5.0-rc3-ee, which will in turn rollback to v0.126.0 of Gitaly. I'm not sure if this issue warrants that.
  • Rollback just to v0.126.0 on Gitaly nodes. This may involve more manual labor than a full rollback.
  • Release a new version of gitaly that fixes the issue, and deploy that new version. That fix could either be rolling back gitlab-org/gitaly!954 (merged) or somehow overriding the field to have a new name.
  • Do some configuration change that remaps the field to a new name, or transforms it into an object somewhere outside of Gitaly? (On kibana's or pubsub configuration? I'm not sure if this is possible TBH).

@ahanselka and I discussed these options but none seemed straightforward/fast enough to at least get back visibility while we settle on a definite solution.

/cc @gitlab-com/gl-infra @gl-gitaly @bjk-gitlab @andrewn

Assignee Loading
Time tracking Loading