Do not index a malformed field in ES

This is found in the GKE indexes. It is emitted by the Docker registry, and is sometimes a string, sometimes an object. By not indexing it, we will not be able to search on it, but all logs should end up in ES instead of dropping some of them based on type.


Related: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/8931

The type of error this is trying to squash:

2020-01-20_14:43:46.70264 2020-01-20T14:43:46.702Z      WARN    elasticsearch/client.go:511     Cannot index event

publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xbf818d0863a4d982,
ext:256079236860397, loc:(*time.Location)(0x3dcf440)}, Meta:common.MapStr(nil),
Fields:common.MapStr{"agent":common.MapStr{"ephemeral_id":"cd9ba012-45e1-4430-847e-c10b01042090",
"hostname":"pubsub-duplicate-gke-inf-gprd",
"id":"a7d37e08-f3d2-45a7-baea-9568c0d2ca74", "type":"pubsubbeat",
"version":"7.5.1"},
"attributes":common.MapStr{"logging.googleapis.com/timestamp":"2020-01-20T14:43:44.028123409Z"},
"ecs":common.MapStr{"version":"1.4.0"},
"host":common.MapStr{"name":"pubsub-duplicate-gke-inf-gprd"},
"json":common.MapStr{"insertId":"1nxxvc3g23gyoc6",
"jsonPayload":common.MapStr{"auth.user.name":"", "err.code":"name unknown",
"err.detail":common.MapStr{"name":"the-student-hotel/thestudenthotel.cms"},
"err.message":"repository name not known to registry", "go.version":"go1.12.9",
"http.request.host":"registry.gitlab.com",
"http.request.id":"d8c92096-dfe8-4b06-81c6-9c3a8a9a1cbc",
"http.request.method":"GET", "http.request.remoteaddr":"34.74.188.180",
"http.request.uri":"/v2/the-student-hotel/thestudenthotel.cms/tags/list",
"http.request.useragent":"Faraday v0.15.4",
"http.response.contenttype":"application/json",
"http.response.duration":"20.947078ms", "http.response.status":404,
"http.response.written":145, "level":"error", "msg":"response completed with
error", "vars.name":"the-student-hotel/thestudenthotel.cms"},
"labels":common.MapStr{"compute.googleapis.com/resource_name":"gke-gprd-gitlab-gke-node-pool-2019092-539112d4-wszl",
"container.googleapis.com/namespace_name":"gitlab",
"container.googleapis.com/pod_name":"gitlab-registry-7bcf5d864-z76dc",
"container.googleapis.com/stream":"stderr"},
"logName":"projects/gitlab-production/logs/registry",
"receiveTimestamp":"2020-01-20T14:43:44.764458844Z",
"resource":common.MapStr{"labels":common.MapStr{"cluster_name":"gprd-gitlab-gke",
"container_name":"registry", "instance_id":"8414440413714042730",
"namespace_id":"gitlab", "pod_id":"gitlab-registry-7bcf5d864-z76dc",
"project_id":"gitlab-production", "zone":"us-east1-c"}, "type":"container"},
"severity":"ERROR", "timestamp":"2020-01-20T14:43:44.028123409Z"},
"message":"{\"insertId\":\"1nxxvc3g23gyoc6\",\"jsonPayload\":{\"auth.user.name\":\"\",\"err.code\":\"name
unknown\",\"err.detail\":{\"name\":\"the-student-hotel/thestudenthotel.cms\"},\"err.message\":\"repository
name not known to
registry\",\"go.version\":\"go1.12.9\",\"http.request.host\":\"registry.gitlab.com\",\"http.request.id\":\"d8c92096-dfe8-4b06-81c6-9c3a8a9a1cbc\",\"http.request.method\":\"GET\",\"http.request.remoteaddr\":\"34.74.188.180\",\"http.request.uri\":\"/v2/the-student-hotel/thestudenthotel.cms/tags/list\",\"http.request.useragent\":\"Faraday
v0.15.4\",\"http.response.contenttype\":\"application/json\",\"http.response.duration\":\"20.947078ms\",\"http.response.status\":404,\"http.response.written\":145,\"level\":\"error\",\"msg\":\"response
completed with
error\",\"vars.name\":\"the-student-hotel/thestudenthotel.cms\"},\"labels\":{\"compute.googleapis.com/resource_name\":\"gke-gprd-gitlab-gke-node-pool-2019092-539112d4-wszl\",\"container.googleapis.com/namespace_name\":\"gitlab\",\"container.googleapis.com/pod_name\":\"gitlab-registry-7bcf5d864-z76dc\",\"container.googleapis.com/stream\":\"stderr\"},\"logName\":\"projects/gitlab-production/logs/registry\",\"receiveTimestamp\":\"2020-01-20T14:43:44.764458844Z\",\"resource\":{\"labels\":{\"cluster_name\":\"gprd-gitlab-gke\",\"container_name\":\"registry\",\"instance_id\":\"8414440413714042730\",\"namespace_id\":\"gitlab\",\"pod_id\":\"gitlab-registry-7bcf5d864-z76dc\",\"project_id\":\"gitlab-production\",\"zone\":\"us-east1-c\"},\"type\":\"container\"},\"severity\":\"ERROR\",\"timestamp\":\"2020-01-20T14:43:44.028123409Z\"}",
"message_id":"969735366213238", "publish_time":common.Time{wall:0x1b3d4440,
ext:63715128225, loc:(*time.Location)(nil)},
"type":"pubsub-duplicate-gke-inf-gprd"}, Private:interface {}(nil),
TimeSeries:false}, Flags:0x0, Cache:publisher.EventCache{m:common.MapStr(nil)}}
(status=400): {"type":"mapper_parsing_exception","reason":"failed to parse field
[json.jsonPayload.err.detail] of type [text] in document with id
'zvdow28BwvWOXgWL2t1c'. Preview of field's value:
'{name=the-student-hotel/thestudenthotel.cms}'","caused_by":{"type":"illegal_state_exception","reason":"Can't
get text on a START_OBJECT at 1:2214"}}

I would have split this up into nonprod then prod cluster MRs, but there are not enough gstg registry logs to reproduce the indexing problem in the beat logs.

If this is merged, I plan to:

  1. Rollover the prod GKE index alias
  2. Check GKE logs are still being processed
  3. Check pubsubbeat logs. Check whether we are still seeing errors due to err.detail, as we see above.
  4. Validate whether json.jsonPayload.err.detail is still present in ES logs.

Merge request reports

Loading