Introduce unique instance_id
Problem
In https://gitlab.com/gitlab-org/gitlab/-/issues/498380#note_2343131661 we discovered that instance IDs (uuid
) are sometimes shared across different GitLab instances. Also the combined key of (hostname, uuid)
cannot be trusted.
We currently use Gitlab::GlobalAnonymousId.instance_id
as the instance_id
. Since this ID is stored in the database, it can easily be duplicated when a preconfigured installation is shared between multiple deployments.
Having shared instance IDs leads to inaccurate data in both Service Ping and Snowplow events from Service Monitoring, as we cannot distinguish events coming from different instances.
Desired Outcome
We have a mechanism to uniquely and consistently identify GitLab instances in Service Ping and Snowplow events, even when installations are cloned or copied.
Potential Solution
Generate a unique instance ID using Gitlab::CryptoHelper.sha256('some-constant-string')
. (Let's refer to this as hashed_id
for now).
This approach leverages the fact that SHA-256 is a cryptographic hash function, and Gitlab::CryptoHelper.sha256
is salted with the encrypted db_key_base
(See Docs about Secret Entries). This means hashed_id
will only be identical if instances share the exact same database credentials.
Since we strongly discourage users from rotating the db_key_base
, we expect ID changes to be rare. (TODO: Can we verify this?)
hashed_id
should be added to the Service Ping and the Standard Context of Snowplow events.
Optional: Turn hashed_id
into a UUID using Gitlab::UUID.v5(hashed_id)
if that is helpful in the data warehouse.
Potential problems and remedies
-
db_key_base
rotation results in new instance IDs: We could detect key rotations in the version app by continuing to send the originalGitlab::GlobalAnonymousId.instance_id
alongside the new hashed ID. - Shared
db_key_base
between instances: We may need to implement detection mechanisms in the Version app to flag such instances.
Documentation Update Needed
-
Yes -
No