Large discrepancy between Ultimate installations in mart_behavior_structured_event and Service Ping data

Problem

When querying Self-Managed data in the mart_behavior_structured_event table, there is a significant discrepancy between Ultimate and Premium installations when comparing to Service Ping data.

Data Findings (October 2025)

Ultimate tier:

  • 37.6 million user IDs in structured events
  • Only 39% of Ultimate dim_installation_ids in structured events had corresponding IDs in mart_ping_instance_metric_monthly
  • Only 27% of installations had CRM data

Premium tier (for comparison):

  • 150K user IDs in structured events
  • 94% of Premium installations in structured events had corresponding installation IDs in Service Ping
  • 92% of installations had CRM data

Summary Table

Plan Structured Events Installations Service Ping Installations Installations with CRM % in Service Ping % with CRM
Premium 2,382 2,231 2,198 94% 92%
Ultimate 3,411 1,322 933 39% 27%

Query Used

WITH crms AS (

SELECT
    MAX(dim_crm_account_id) AS dim_crm_account_id, --To return a dim CRM account ID if available
    dim_installation_id,
    ping_product_tier,
    MAX(umau_value) AS umau_value
FROM common_mart.mart_ping_instance_metric_monthly
WHERE is_umau = TRUE
--AND is_program_subscription = FALSE
--AND is_paid_subscription = TRUE
AND ping_created_date_month = '2025-10-01'
GROUP BY ALL

), mart_and_crm AS (


SELECT
    mart.dim_installation_id        AS mart_behavior_installation_id,
    crms.dim_installation_id AS service_ping_installation_id,
    crms.dim_crm_account_id,
    mart.is_internal_installation,
    crms.umau_value,
    plan_name_modified,
    COUNT(DISTINCT gitlab_global_user_id) AS users
FROM common_mart.mart_behavior_structured_event mart
LEFT JOIN crms ON crms.dim_installation_id = mart.dim_installation_id
AND LOWER(plan_name_modified) = LOWER(ping_product_tier)
WHERE DATE_TRUNC('month',behavior_at) = '2025-10-01'
AND IFNULL(gsc_is_gitlab_team_member,FALSE) = FALSE
AND deployment_type IS NOT NULL
AND event_category = 'Users::ActivityService' --Filtering for broad event category
AND user_type = 'human' --Filter for human users
AND deployment_type = 'Self-Managed'
GROUP BY ALL ORDER BY users DESC 

)

SELECT
    plan_name_modified,
    COUNT(DISTINCT mart_behavior_installation_id)                                           AS mart_behavior_installation_count,
    COUNT(DISTINCT service_ping_installation_id)                                            AS service_ping_installation_count,
    COUNT(DISTINCT IFF(dim_crm_account_id IS NOT NULL, mart_behavior_installation_id,NULL)) AS installations_with_crm,
    DIV0(service_ping_installation_count,mart_behavior_installation_count)                  AS "% of installations in Service Ping",
    DIV0(installations_with_crm,mart_behavior_installation_count)                           AS "% of installations with CRM"
FROM mart_and_crm
WHERE plan_name_modified IN ('premium','ultimate')
GROUP BY 1

Questions

  1. What is driving the large difference in Ultimate installations between structured events and Service Ping data?
  2. Why is the Premium tier showing much better alignment (94%) compared to Ultimate (39%)?
  3. Is this a data quality issue, a configuration issue with Ultimate installations, or expected behavior?

Reporter

Dave Peterson (via Slack) CC: @Carolyn Braza

Edited by 🤖 GitLab Bot 🤖