Do not index a malformed field in ES
This is found in the GKE indexes. It is emitted by the Docker registry, and is sometimes a string, sometimes an object. By not indexing it, we will not be able to search on it, but all logs should end up in ES instead of dropping some of them based on type.
Related: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/8931
The type of error this is trying to squash:
2020-01-20_14:43:46.70264 2020-01-20T14:43:46.702Z WARN elasticsearch/client.go:511 Cannot index event
publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xbf818d0863a4d982,
ext:256079236860397, loc:(*time.Location)(0x3dcf440)}, Meta:common.MapStr(nil),
Fields:common.MapStr{"agent":common.MapStr{"ephemeral_id":"cd9ba012-45e1-4430-847e-c10b01042090",
"hostname":"pubsub-duplicate-gke-inf-gprd",
"id":"a7d37e08-f3d2-45a7-baea-9568c0d2ca74", "type":"pubsubbeat",
"version":"7.5.1"},
"attributes":common.MapStr{"logging.googleapis.com/timestamp":"2020-01-20T14:43:44.028123409Z"},
"ecs":common.MapStr{"version":"1.4.0"},
"host":common.MapStr{"name":"pubsub-duplicate-gke-inf-gprd"},
"json":common.MapStr{"insertId":"1nxxvc3g23gyoc6",
"jsonPayload":common.MapStr{"auth.user.name":"", "err.code":"name unknown",
"err.detail":common.MapStr{"name":"the-student-hotel/thestudenthotel.cms"},
"err.message":"repository name not known to registry", "go.version":"go1.12.9",
"http.request.host":"registry.gitlab.com",
"http.request.id":"d8c92096-dfe8-4b06-81c6-9c3a8a9a1cbc",
"http.request.method":"GET", "http.request.remoteaddr":"34.74.188.180",
"http.request.uri":"/v2/the-student-hotel/thestudenthotel.cms/tags/list",
"http.request.useragent":"Faraday v0.15.4",
"http.response.contenttype":"application/json",
"http.response.duration":"20.947078ms", "http.response.status":404,
"http.response.written":145, "level":"error", "msg":"response completed with
error", "vars.name":"the-student-hotel/thestudenthotel.cms"},
"labels":common.MapStr{"compute.googleapis.com/resource_name":"gke-gprd-gitlab-gke-node-pool-2019092-539112d4-wszl",
"container.googleapis.com/namespace_name":"gitlab",
"container.googleapis.com/pod_name":"gitlab-registry-7bcf5d864-z76dc",
"container.googleapis.com/stream":"stderr"},
"logName":"projects/gitlab-production/logs/registry",
"receiveTimestamp":"2020-01-20T14:43:44.764458844Z",
"resource":common.MapStr{"labels":common.MapStr{"cluster_name":"gprd-gitlab-gke",
"container_name":"registry", "instance_id":"8414440413714042730",
"namespace_id":"gitlab", "pod_id":"gitlab-registry-7bcf5d864-z76dc",
"project_id":"gitlab-production", "zone":"us-east1-c"}, "type":"container"},
"severity":"ERROR", "timestamp":"2020-01-20T14:43:44.028123409Z"},
"message":"{\"insertId\":\"1nxxvc3g23gyoc6\",\"jsonPayload\":{\"auth.user.name\":\"\",\"err.code\":\"name
unknown\",\"err.detail\":{\"name\":\"the-student-hotel/thestudenthotel.cms\"},\"err.message\":\"repository
name not known to
registry\",\"go.version\":\"go1.12.9\",\"http.request.host\":\"registry.gitlab.com\",\"http.request.id\":\"d8c92096-dfe8-4b06-81c6-9c3a8a9a1cbc\",\"http.request.method\":\"GET\",\"http.request.remoteaddr\":\"34.74.188.180\",\"http.request.uri\":\"/v2/the-student-hotel/thestudenthotel.cms/tags/list\",\"http.request.useragent\":\"Faraday
v0.15.4\",\"http.response.contenttype\":\"application/json\",\"http.response.duration\":\"20.947078ms\",\"http.response.status\":404,\"http.response.written\":145,\"level\":\"error\",\"msg\":\"response
completed with
error\",\"vars.name\":\"the-student-hotel/thestudenthotel.cms\"},\"labels\":{\"compute.googleapis.com/resource_name\":\"gke-gprd-gitlab-gke-node-pool-2019092-539112d4-wszl\",\"container.googleapis.com/namespace_name\":\"gitlab\",\"container.googleapis.com/pod_name\":\"gitlab-registry-7bcf5d864-z76dc\",\"container.googleapis.com/stream\":\"stderr\"},\"logName\":\"projects/gitlab-production/logs/registry\",\"receiveTimestamp\":\"2020-01-20T14:43:44.764458844Z\",\"resource\":{\"labels\":{\"cluster_name\":\"gprd-gitlab-gke\",\"container_name\":\"registry\",\"instance_id\":\"8414440413714042730\",\"namespace_id\":\"gitlab\",\"pod_id\":\"gitlab-registry-7bcf5d864-z76dc\",\"project_id\":\"gitlab-production\",\"zone\":\"us-east1-c\"},\"type\":\"container\"},\"severity\":\"ERROR\",\"timestamp\":\"2020-01-20T14:43:44.028123409Z\"}",
"message_id":"969735366213238", "publish_time":common.Time{wall:0x1b3d4440,
ext:63715128225, loc:(*time.Location)(nil)},
"type":"pubsub-duplicate-gke-inf-gprd"}, Private:interface {}(nil),
TimeSeries:false}, Flags:0x0, Cache:publisher.EventCache{m:common.MapStr(nil)}}
(status=400): {"type":"mapper_parsing_exception","reason":"failed to parse field
[json.jsonPayload.err.detail] of type [text] in document with id
'zvdow28BwvWOXgWL2t1c'. Preview of field's value:
'{name=the-student-hotel/thestudenthotel.cms}'","caused_by":{"type":"illegal_state_exception","reason":"Can't
get text on a START_OBJECT at 1:2214"}}
I would have split this up into nonprod then prod cluster MRs, but there are not enough gstg registry logs to reproduce the indexing problem in the beat logs.
If this is merged, I plan to:
- Rollover the prod GKE index alias
- Check GKE logs are still being processed
- Check pubsubbeat logs. Check whether we are still seeing errors due to
err.detail, as we see above. - Validate whether
json.jsonPayload.err.detailis still present in ES logs.