Skip to content

Add feature flag field to Gitaly logs

When debugging Gitaly feature flags on production, one of the biggest difficulties is to determine whether a flag is on/off for a particular call. We don't put this information in the logs. Although we have time series metrics about feature flag evaluation, they are barely useful for debugging purpose.

In the code case, we implemented a pair of featureflag middlewares to log out the flags as a field of the logger. Unfortunately, the volume of logs is too huge that it pushes Elasticsearch to limit. We need to find another way. There are some rooms for improvement I observed:

  • The middlewares stringify the flags using flag_a:true flag_b:false flag_c:true. This format is not compact. Disabled flags can be implied if the name is not there. Afterward, the log field becomes flag_a flag_c.
  • The middleware attaches the feature_flag field to the logger. As a result, all subsequent logs will have this field. In most cases, adding the field after the call finished is enough. Converting the middlewares to FieldsProducer will remove redundant info.
  • We may not need to attach this field to every logs. Attaching it to RPC failure is a good start. As a result, I added AlwaysLogFeatureFlag flag for this purpose.

Closes #4622 (closed)

Testing scenarios

All the following tests are conducted using the following feature flags:

[68] pry(main)> Feature::Gitaly.server_feature_flags
=> {"gitaly-feature-mep-mep"=>"true",
 "gitaly-feature-go-find-license"=>"true",
 "gitaly-feature-node-error-cancels-voter"=>"false",
 "gitaly-feature-always-log-feature-flags"=>"true"}

The result is intercepted from Gitaly logs. I use jq to make them nicer for reading.

 gdk tail gitaly |  jq --stream --unbuffered --raw-input 'split("{")|.[1:]|join("{")|"{" + .|fromjson|{feature_flags, "grpc.request.fullMethod", "grpc.meta.client_name", "grpc.code", "msg"}'

feature_flag field is not included by default

  • Unary call

Screenshot_2022-11-15_at_12.59.09

  • Streaming call

Screenshot_2022-11-15_at_13.58.44

When a RPC fails, the logs attach this field. Other logs in the same RPC don't have this field, though

  • Unary call

Screenshot_2022-11-15_at_13.13.07

  • Streaming call

Screenshot_2022-11-15_at_14.00.15

When AlwaysLogFeatureFlag is on, this field is always attached

  • Unary call

Screenshot_2022-11-15_at_12.59.54

  • Streaming call

Screenshot_2022-11-15_at_13.12.54

Edited by Quang-Minh Nguyen

Merge request reports