Skip to content

Add pre-aggregations for Cube refresh workers

Robert Hunt requested to merge initial-sessions-pre-aggregations into main

What does this MR do and why?

This MR updates Cube to handle dynamic refresh contexts. It does this by retrieving the list of activated apps from the CH active_apps table. We can then use the project_id's from the table to generate an array of tables to refresh.

It is worth noting that right now, Funnels will not be refreshable. I've left a discussion topic below.

How to set up and validate locally

  1. Pull this branch and modify the cube/schema/Sessions.js schema to include a dummy pre-aggregation for testing purposes:

Note: This pre-aggregation will cause a Error: Code: 44. DB::Exception: Sorting key contains nullable columns, but merge tree setting allow_nullable_key is disabled. (ILLEGAL_COLUMN) (version 23.8.2.7 (official build)). That's a table structure problem we need to solve when we do begin to use pre-aggregations.

preAggregations: {
  main: {
    external: false,
    measures: [
      CUBE.count
    ],
    dimensions: [
      CUBE.agentName,
      CUBE.agentVersion,
      CUBE.startAt
    ],
    time_dimension: CUBE.startAt,
    granularity: `day`,
    partition_granularity: `month`,
    indexes: {
      sessions_index: {
        columns: [agentName, agentVersion],
      },
    },
    refresh_key: {
      every: `1 minute`,
      sql: `SELECT MAX(derived_tstamp) FROM ${TrackedEvents.sql()}`
    }
  }
},
  1. Delete your existing containers: docker compose -f docker-compose.yml down
  2. Create the containers again in a detached state: docker compose -f docker-compose.yml up -d --build --remove-orphans
  3. Set up Clickhouse if needed: curl -X POST http://localhost:4567/setup-clickhouse -u test:test
  4. If you haven't already, go through your GDK to set up a new project.
  5. You should find that the setup process has created the cube/node_modules folder and pre-aggregations are beginning to be run: docker compose logs -f -t cube
  6. Visit your Cube instances playground and run a query that uses the measures and dimensions from the pre-aggregation:
{
  "measures": [
    "Sessions.count"
  ],
  "timeDimensions": [
    {
      "dimension": "Sessions.startAt",
      "granularity": "day"
    }
  ],
  "order": {
    "Sessions.count": "desc"
  },
  "dimensions": [
    "Sessions.agentName",
    "Sessions.agentVersion"
  ]
}
  1. Verify the UI tells you that it is using the pre-aggregations and the logs show pre-aggregations running: image

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Relates to https://gitlab.com/gitlab-org/analytics-section/product-analytics/analytics-stack/-/issues/69

Edited by Jiaan Louw

Merge request reports