Skip to content

Reduce the number of mergeability.* fields in merge_request logs being ingested by ElasticSearch

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Background Context

I've been investigating the possibility of having a strongly-defined schema for the fields that we emit via our logging libraries. As part of this investigation, it was raised to me that field limit issues consistently cause us issues within our dedicated setup, that causes a number of incidents for that team.

This is a rather frustrating source of toil, and it's likely that it's contributing to the reputational damage of our product, given that customers are escalating this to us in the form of incidents.

Mitigation

One of the ways we can mitigate this is to be more deliberate around the names of the fields that we use within our observability setup. We should attempt to avoid high-cardinality field names and instead opt for low-cardinality alternatives that capture any high-cardinality values as values.

For example, we currently have field names that follow the pattern:

json.properties.mergeability.{mergeabilityname}.{subfieldnames}

If we can reduce this down to something like:

json.properties.mergeability.name = {mergeabilityname}
and flatten the subfields like so:
json.properties.mergeability.{subfieldnames...}`

Then we will drastically reduce the total number of fields being indexed by our elasticsearch offering whilst still retaining the contextual data needed in order to investigate issues.

This also means we don't face an exponential explosion of fields being indexed as the number of mergeability types grows.

We believe the code responsible for this explosion in field names is here - https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/services/merge_requests/mergeability/logger.rb

Supporting Evidence

@igorwwwwwwwwwwwwwwwwwwww has run a few curl commands to generate flame graphs that capture the fields we currently index today.

mergeability.* fields account for 2.7% of all fields indexed for our dotcom offering -

rails_mappings.svg

From our sidekiq index, they account for 4.68% -

sidekiq_mappings.svg

Before raising an issue to the GitLab issue tracker, please read through our guide for finding help to determine the best place to post:

If you are experiencing an issue when using GitLab.com, your first port of call should be the Community Forum. Your issue may have already been reported there by another user. Please check:

If you feel that your issue can be categorized as a reproducible bug or a feature proposal, please use one of the issue templates provided and include as much information as possible.

Thank you for helping to make GitLab a better product.

Edited by 🤖 GitLab Bot 🤖