Emit context metadata through response headers from rails to workhorse

Context metadata is now included on all sidekiq jobs and is helping us understand our workloads much better.

This metadata is generated in Rails, and pushed into the Sidekiq context.

Unfortunately, workhorse has no access to this metadata, because it cannot (for example) understand Rails URLs, decode cookies, etc. Nor should it.

However, from an accounting, logging, security and observability point of view, knowing so little about the request is a shortcoming of Workhorse.

From an observability point of view, monitoring at the Workhorse tier is preferable because it includes Ruby process queuing times and times spent performing actions offloaded from rails (for example, archive generation, object storage, etc).

Proposal: Emit context metadata from Rails to Workhorse

Step 1: A Labkit-Ruby middleware, in Rails, emits the context metadata through response headers. Each metadata field is converted into a header, as follow with a X-Gitlab-Context- prefix.

GET /
...

200 OK
...
X-Gitlab-Context-Namespace: gitlab-org
X-Gitlab-Context-User: marin
...

We already have the X-Gitlab-Feature-Category header, which we can rename.

Step 2: A Labkit Roundtripper (HTTP Client Middleware) in Workhorse, strips the X-Gitlab-Context-* from the response, so that they are not accessible to the client, and passes them (through context.Context for example) on to the HTTP access logger.

The access logger includes the metadata fields in the workhorse HTTP access logs:

{ "status_code": "200", "meta": { "namespace": "gitlab-org", "user": "marin", ... } ... }

What would we use this for?

A second generation marquee customer monitoring system could be based on the meta.namespace field. Not only would this be far more efficient, it would cover more cases including API access etc.
Better analysis of usage patterns for users across the full request lifecycle. Currently much of our analysis for users is based on what happens in Rails. If we want to understand the full impact, we need to use the correlation_id, but this makes analysis much more complicated.

Edited Nov 17, 2020 by Sean McGivern