Log message improvements
Problem Description
Rails provides a singular interface for logging for much of its monolith which differs from how Gitaly has chosen to do the same. This creates problems for persons monitoring systems and are expecting a unified experience. Rails, for example, uses the error
key as an object, and each subkey receives certain data if existing. This means for log systems such as Elastic search, the dynamic field mapper will build out an index, expecting error
to be an object and any subkey, such as error.class
would be of type text
. This is problematic when systems are using the same index for both our rails and Gitaly services. Gitaly sends the key error
as a type string
containing the error message, which in turn fails to match the dynamic mapping created by Elasticsearch. For most cases, this means that log message would be dropped. This in turn creates a visibility problem when using a log system to investigate issues.
Proposed Solution
Unify the logging mechanism inside of Gitaly to better match that of GitLab Rails to provide a more unified experience for log systems that are unable to cope with formatting limitations invoke by those such as Elasticsearch. When we are sending an error to our logs, let's create a set of objects for which can capture appropriate information. Example log document, mutilated for brevity:
{
"component": "gitaly.StreamServerInterceptor",
"correlation_id": "01HTQ7F5SZ2ETNKYN3PZKXRQSZ",
- "error": "tree entry not found"
- "error_metadata": {
- "path": ".gitlab-ci.yml"
+ "error": {
+ "message": "tree entry not found"
+ "metadata": {
+ "path": ".gitlab-ci.yml"
+ },
},
"level": "info",
"msg": "finished streaming call with code NotFound",
"pid": 428081,
"span.kind": "server",
"system": "grpc",
"time": "2024-04-05T13:34:17.088Z",
"user_id": "1",
"username": "root"
}
}
The above creates a single error
key of type object, with subkeys containing all relevant information for an error message.
This is only a proposal. I'm only proposing this as I recently dealt with a scenario where messages were being dropped from the Dedicated product due to a single index containing both Rails and Gitaly logs and due to this, some logs from either system would be dropped due to dynamic mapping configurations. This can be solved customer side by mucking with the system that gathers and sends logs to Elasticsearch. We proposed this for the Dedicated product here: https://gitlab.com/gitlab-com/gl-infra/gitlab-dedicated/instrumentor/-/merge_requests/3027
I think for the sake of Self Managed customers, this could be a neat thing to accomplish. Do note that we do not tie logs to be unified for all services, so if there's a change in one system, this becomes a large annoying task to keep unification across all systems (meaning, I wonder if this should be part of Lab-Kit).