Discussion: Data Insights Platform use-cases & deployments
Context
As our implementation for Data Insights Platform gains maturity, we see multiple use-cases that the Platform can support. This (evolving) document provides an overview of such use-cases and details of how they intend to use the Platform from a deployment & functionality perspective.
Use cases
| Feature | Proposed Timeline | Deployment/Functional needs |
|---|---|---|
| Usage-billing | FY26Q4 | Single, company-wide instance capable of ingesting Snowplow events from across multiple GitLab systems to process consumption and usage-billing. For starters, we intend to start emitting such billable events from AI gateway (Duo Workflow usage) and dedicated hosted runners. |
| Replace internal Snowplow infrastructure | FY27Q1 | Single, company-wide instance capable of ingesting Snowplow events from multiple environments, i.e. .com, Self-Managed and Dedicated. |
| Replicating Postgres data into ClickHouse | FY26Q3 | Dedicated instance per associated GitLab instance, having access to Postgres and ClickHouse. |
Dimensions driving design of the Platform
- Scale and/or volumes of ingested & processed data.
- Logical presence of clients interacting with the Platform.
- Regulatory concerns around data being ingested & processed.
- Code deployments and/or release cadence.
- Interaction with external dependencies, e.g. Postgres data within Siphon.
Deployment types
To ensure Data Insights Platform can service all of the aforementioned use-cases, we see the following deployment-types - each intentionally designed to cater to one or more such use-cases. From a logical architecture perspective, the Platform & its components remain the same as described here with the capability to switch on or off parts of the pipeline as needed.
company-wide: Single, company-wide Platform instance
This deployment type is a single, large instance that's capable of running Platform components, ingesting & processing data generated across multiple environments.
For now, this deployment type is geared towards our needs within GitLab - to replace our current Snowplow infrastructure and land all usage data from .com, Self-Managed and Dedicated instances in a single unified data platform.
feature-scoped: Single Platform instance isolated per use-case
This deployment type is similar to the one above but running dedicated for a given use-case, potentially in an isolated/regulated environment, e.g. SOx-controlled.
We're still discovering details around a first such use-case with Usage Billing for GitLab wherein such a Platform instance can run within CustomerDot to facilitate necessary audit controls.
instance-local: Dedicated Platform instance per deployed GitLab instance
This deployment type sits close within the same environment as the associated GitLab deployment, geared towards ingesting & processing data generated locally within the same environment by GitLab or other related systems.
For now, our use-case around replicating Postgres data into ClickHouse can leverage this deployment-type ensuring all data remains isolated within the same context as GitLab. Going forward, such a deployment-type can also target analytical use-cases for air-gapped environments without data having to leave any logical boundaries.