Commit e375f84e authored by Carolyn Braza's avatar Carolyn Braza
Browse files

Data for PMs update

parent 87e5f72e
Loading
Loading
Loading
Loading
+23 −14
Original line number Diff line number Diff line
@@ -25,7 +25,7 @@ Here are some useful links that we recommend for you to bookmark:

### Getting Tableau Access

In order to gain access to Tableau, you will need to follow the instructions [here](/handbook/enterprise-data/platform/tableau/#access) and open an access request.
In order to gain access to Tableau, you will need to follow the instructions [here](/handbook/enterprise-data/platform/tableau/#access) and request access via Lumos

- To create your own charts and dashboards, you need to have a Creator or Explorer license. You can read more about the Tableau license types [here](/handbook/enterprise-data/platform/tableau/#capabilities).

@@ -85,7 +85,7 @@ If your analytics needs for your new or recently modified feature are met by the
- [PD: Product Usage Metrics (.com & Service Ping)](https://10az.online.tableau.com/#/site/gitlab/workbooks/2478263/views)
- [PD: Firmographic Product Metric Usage](https://10az.online.tableau.com/#/site/gitlab/workbooks/2137023/views)
- [PD: Subscription Feature Usage Trends](https://10az.online.tableau.com/t/gitlab/views/PDSubscriptionFeatureUsageTrends_17032798065680/ActiveSubscriptionUsageTrends)
- [AI Gateway Reporting](https://10az.online.tableau.com/t/gitlab/views/AIGatewayReporting/Overview)
- [AI Reporting](https://10az.online.tableau.com/t/gitlab/views/AIGatewayReporting/Overview)

### Process for Instrumenting Feature Tracking

@@ -134,7 +134,7 @@ If your analytics needs for your new or recently modified feature are met by the
   - For analyses requiring aggregated SM and Dedicated data (Service Ping), data collection will be sufficient for analysis 6-8 weeks after MR merge due to minimum version adoption requirement for Service Ping metrics
   - Complete requirements specified in PDI Issue (if applicable)

### Special Considerations for AI Gateway Features
### Special Considerations for AI Features

When instrumenting features routed through the AI Gateway, follow these guidelines:

@@ -151,10 +151,10 @@ When instrumenting features routed through the AI Gateway, follow these guidelin

1. For more granular reporting
   - If you need more detail than a 'request' of the AI Gateway at the broad feature grain, use [Internal Events Tracking](https://docs.gitlab.com/ee/development/internal_analytics/internal_event_instrumentation/quick_start.html)
   - Internal events can be connected to unit primitive events using a `correlation_id` for behavior funnel or more granular reporting use cases (GitLab.com only)
   - Internal events can be connected to unit primitive events using a `correlation_id` for behavior funnel or more granular reporting use cases

1. Viewing AI Gateway data
   - [AI Gateway Reporting](https://10az.online.tableau.com/t/gitlab/views/AIGatewayReporting/Overview) automatically displays new unit primitive requests once they have been instrumented by ~"group::analytics instrumentation"
   - [AI Reporting](https://10az.online.tableau.com/t/gitlab/views/AIGatewayReporting/Overview) automatically displays new unit primitive requests once they have been instrumented by ~"group::analytics instrumentation"
   - Additional analytics can be requested by creating a [Product Data Insights (PDI) Issue](https://gitlab.com/gitlab-data/product-analytics/-/issues/new)

### Key Contacts and Resources
@@ -170,7 +170,7 @@ We have three primary data sources for product usage data:

- **Service Ping** (for Self-Managed, Dedicated, and GitLab.com)
- **GitLab.com Postgres Database** (for GitLab.com)
- **Snowplow** (for GitLab.com and the AI Gateway)
- **Snowplow (Internal Events)** (for Self-Managed, Dedicated, GitLab.com and AI/DAP)

Each data source comes with its own caveats, capabilities, and limitations. The first question we on the Data or PDI teams ask product managers is usually "are you interested in knowing this for Self-Managed or GitLab.com?" Our approach to answering your question and the data source(s) available differ greatly between the two. Although our Self-Managed offering has many more active customers, our GitLab.com offering has much more granular data available to analyze.

@@ -211,7 +211,7 @@ Here is an example of a query that provides ping-level details, filters out GitL
```sql
SELECT *
FROM common_mart.mart_ping_instance
WHERE ping_created_at >= CURRENT_DATE-30
WHERE ping_created_date_month = DATEADD('month', -1, DATE_TRUNC('month', CURRENT_DATE)) --last completed month
  AND ping_deployment_type != 'GitLab.com'
  AND is_last_ping_of_month = TRUE
LIMIT 1000
@@ -232,7 +232,7 @@ SELECT
  SUM(monthly_metric_value) AS monthly_metric_value,
  COUNT(DISTINCT IFF(monthly_metric_value > 0, dim_installation_id, NULL)) AS installation_count --count of installations reporting usage that month
FROM common_mart.mart_ping_instance_metric_monthly --this model already filters to the last ping of the month
WHERE ping_created_date_month >= '2024-06-01'
WHERE ping_created_date_month BETWEEN DATEADD('month', -6, DATE_TRUNC('month', CURRENT_DATE)) AND DATEADD('month', -1, DATE_TRUNC('month', CURRENT_DATE)) --last 6 complete months
  AND metrics_path = 'usage_activity_by_stage_monthly.secure.sast_scans' --arbitrary metric, switch this out for metric of interest
GROUP BY 1,2,3
ORDER BY 1,2
@@ -249,7 +249,13 @@ Because GitLab.com is a GitLab instance hosted by GitLab, we have access to the

#### What if the table or column I want isn't in the data warehouse?

Our ELT process works by explicitly stating which columns and tables we want to import into the data warehouse. This means we might be missing a column or whole table that you want to have in the data warehouse for analysis. When this is the case, please create a Data issue letting us know what you want us to import using the [New Data Source template](https://gitlab.com/gitlab-data/analytics/-/issues/new?issuable_template=%5BNew%20Request%5D%20New%20Data%20Source). Before doing so, please confirm that the table/column is truly part of the [production schema](https://gitlab.com/gitlab-org/gitlab/-/blob/master/db/structure.sql).
Our ELT process works by explicitly stating which columns and tables we want to import into the data warehouse. This means we might be missing a column or whole table that you want to have in the data warehouse for analysis.
When this is the case, please create a Data issue letting us know what you want us to import using one of the following two templates:

1. If an entire table needs to be imported, use the [New Data Source template](https://gitlab.com/gitlab-data/analytics/-/work_items/new?description_template=%5BNew%20Request%5D%20New%20Data%20Source)
2. If a column needs to be added to an existing table, use the [New Data Column template](https://gitlab.com/gitlab-data/analytics/-/work_items/new?description_template=%5BNew%20Request%5D%20New%20Data%20Column)

Before doing so, please confirm that the table/column is truly part of the [production schema](https://gitlab.com/gitlab-org/gitlab/-/blob/master/db/structure.sql).

#### Replicating Service Ping using GitLab.com Data

@@ -287,7 +293,7 @@ SELECT
  group_name,
  user_count
FROM common_mart_product.rpt_event_xmau_metric_monthly
WHERE event_calendar_month >= '2024-06-01'
WHERE event_calendar_month BETWEEN DATEADD('month', -6, DATE_TRUNC('month', CURRENT_DATE)) AND DATEADD('month', -1, DATE_TRUNC('month', CURRENT_DATE)) --last 6 complete months
  AND is_gmau = TRUE
  AND user_group = 'paid'
ORDER BY 2,1
@@ -302,15 +308,18 @@ Snowplow Analytics is an open-source enterprise event-level analytics platform t

- We pseudonymize `user_id` on all Snowplow events, meaning that we are unable to connect an event to a specific user (or the GitLab.com Postgres db).
  - We also pseudonymize page URLs to remove any potential PII or RED data.
- Self-Managed and Dedicated installations do not send Snowplow data to GitLab.
  - The one exception is the AI Gateway where we receive events from all deployment types (Self-Managed, Dedicated, and GitLab.com).
- Self-Managed and Dedicated installations started sending event-level data in 18.0.

#### Key Concepts

- Because Snowplow does not rely on Service Ping, we do not need to wait for a version of GitLab to be adopted to start receiving data. We can collect and visualize data as soon as the instrumentation is deployed.
- Like Service Ping, Snowplow data is dependent on version adoption for Self-Managed and Dedicated installations. This means that we need to wait for a version of GitLab to be adopted before we start receiving data.
  - Note: Since GitLab.com is always on the latest version of GitLab, we start collecting data as soon as the instrumentation is deployed.
- Even though the pseudonymization of `user_id` of Snowplow events is a limitation, with the fast feedback, Snowplow is an effective source of data to measure feature adoption and usage.
  - Note: We are still able to count the number of users who engage with a feature, which is sufficient for most use cases. We just do not know who those users are.
- Snowplow events can be blocked by the user.
- In the case of Self-Managed and Dedicated installations, sending event-level data is optional but defaults to being on.
  - You can find the event-level data opt-in rate in [this dashboard](https://10az.online.tableau.com/#/site/gitlab/workbooks/3294705/views).
  - The one exception is the AI events where users cannot opt out of Snowplow events across all deployment types (Self-Managed, Dedicated, and GitLab.com).
- Snowplow events can be blocked by the user (with the exception of AI events).

#### Instrumentation