Cells: UUIDs in Cloud Connector usage data
Problem
For Cloud Connector usage data, GitLab instances currently send their instance UUID in an HTTP header into backend services with each request. We use this as a proxy for customer identity so we can count how many customers (instances) use a particular Cloud Connector feature each day. The UUID is seeded once during installation / first boot and remains stable unless it is manually reset or the database is wiped. For gitlab.com, there is a single UUID we use to count usage for our multi-tenant deployment.
In Cells 1.0 discovery, we found that each Cell now maintains its own UUID. Unless we do something about this, once Cells 1.0 launches, usage originating from secondary cells will now appear as separate instances.
Solution
There are several ways to address this:
-
Do nothing. We accept this as a fact / minor inconvenience and/or correct for this in ETL or dashboards that these UUIDs all belong to gitlab.com (we also track the
gitlab_host_name
so this should be possible). - Do not use UUIDs. We move away from using UUIDs, in favor of a cross-Cell datum that is more stable. One option could be to use the instance host name instead. For gitlab.com this would work well; I am not sure if this works well for self-managed or which other implications this may have.
- Let Cells share a UUID. This is a more technical approach, and would require working with grouptenant scale to figure out how to provide a cluster-wide ID we could use for tracking instead.