Discussion: Data Insights Platform use-cases & deployments

Context

As our implementation for Data Insights Platform gains maturity, we see multiple use-cases that the Platform can support. This (evolving) document provides an overview of such use-cases and details of how they intend to use the Platform from a deployment & functionality perspective.

Use cases

Feature Proposed Timeline Deployment/Functional needs
Usage-billing FY26Q4 Single, company-wide instance capable of ingesting Snowplow events from across multiple GitLab systems to process consumption and usage-billing. For starters, we intend to start emitting such billable events from AI gateway (Duo Workflow usage) and dedicated hosted runners.
Replace internal Snowplow infrastructure FY27Q1 Single, company-wide instance capable of ingesting Snowplow events from multiple environments, i.e. .com, Self-Managed and Dedicated.
Replicating Postgres data into ClickHouse FY26Q3 Dedicated instance per associated GitLab instance, having access to Postgres and ClickHouse.

Dimensions driving design of the Platform

  • Scale and/or volumes of ingested & processed data.
  • Logical presence of clients interacting with the Platform.
  • Regulatory concerns around data being ingested & processed.
  • Code deployments and/or release cadence.
  • Interaction with external dependencies, e.g. Postgres data within Siphon.

Deployment types

To ensure Data Insights Platform can service all of the aforementioned use-cases, we see the following deployment-types - each intentionally designed to cater to one or more such use-cases. From a logical architecture perspective, the Platform & its components remain the same as described here with the capability to switch on or off parts of the pipeline as needed.

company-wide: Single, company-wide Platform instance

This deployment type is a single, large instance that's capable of running Platform components, ingesting & processing data generated across multiple environments.

For now, this deployment type is geared towards our needs within GitLab - to replace our current Snowplow infrastructure and land all usage data from .com, Self-Managed and Dedicated instances in a single unified data platform.

feature-scoped: Single Platform instance isolated per use-case

This deployment type is similar to the one above but running dedicated for a given use-case, potentially in an isolated/regulated environment, e.g. SOx-controlled.

We're still discovering details around a first such use-case with Usage Billing for GitLab wherein such a Platform instance can run within CustomerDot to facilitate necessary audit controls.

instance-local: Dedicated Platform instance per deployed GitLab instance

This deployment type sits close within the same environment as the associated GitLab deployment, geared towards ingesting & processing data generated locally within the same environment by GitLab or other related systems.

For now, our use-case around replicating Postgres data into ClickHouse can leverage this deployment-type ensuring all data remains isolated within the same context as GitLab. Going forward, such a deployment-type can also target analytical use-cases for air-gapped environments without data having to leave any logical boundaries.

Resources

Edited by Ankit Bhatnagar