Correlation_id mismatch between Workhorse and Rails logs

Problem statement

Under some conditions yet to be discovered, Workhorse and Rails end up using different correlation_id values for the same transaction. This makes it more difficult and time-consuming to trace a transaction across service layers.

For at least the last 7 days (probably longer), this correlation_id mismatch has been affecting HTTP requests to Repositories::GitHttpController.

Example

To demonstrate the problem, run a git-over-http command on some distinctive repo (in this example an empty repo I created recently). In Kibana, query the relevant timespan on the index for workhorse (pubsub-workhorse-inf-gprd-*), showing requests involving that repo (filtering to your client IP address if necessary). Repeat the equivalent Kibana query on the index for rails (pubsub-rails-inf-gprd-*). Note that the json.correlation_id field has different values (and different formats) in workhorse versus rails.

Client command:

Any client-side git-over-http command will suffice. Here we run a git clone on a distinctive repo.

msmiley@saoirse:~$ date -u ; git clone https://gitlab.com/msmiley/test-966.git 
Thu Sep 17 00:16:46 UTC 2020
Cloning into 'test-966'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.

Workhorse log entries:

https://log.gprd.gitlab.net/goto/126f29f1f656f9a221b1b7b60a7577d1

Screenshot_from_2020-09-16_17-32-31

Rails log entries:

These correspond to the above but using distinct correlation_id values.

https://log.gprd.gitlab.net/goto/5ee40923622f2235c79d28b2327fae05

Screenshot_from_2020-09-16_17-33-08

Background

Ideally to support tracing across service calls we want a trace id (correlation_id) to be set as early in the call chain as possible, but because there are multiple ways for requests to enter the system, several components are capable of generating a new correlation_id if one does not already exist on the incoming request.

The most common generator of correlation_ids is Workhorse, which uses the golang version of LabKit to generate a base64-encoded 11-character string representing a random 64-bit integer (e.g. NkD7IkM5XY4 from the above example).

Another correlation_id generator is the ruby version of LabKit, which generates a 128-bit hex string by calling SecureRandom.hex (e.g. 36aeacd2d3bb6a2cb7c9d8c3717ced0b). The GitLab Rails app uses LabKit's correlation_id in places like class ApplicationController, but that is not what generates the correlation_id values we see in the example above.

A third format of correlation_id is a UUID (e.g. fdbc97f4-76f4-4646-a531-8c5f03927b53 from the above example). I am not sure where this gets generated.