Improve Sessions database performance
Issue
Sessions calculations are currently using more memory that is allowed by ClickHouse. Queries are regularly using 100GB or more of RAM on every run.
We can make use of materialized views to calculate sessions to reduce memory usage on querying.
Implementation plan
-
Move sessions to a pre-calculated materialized view - https://gitlab.com/gitlab-org/analytics-section/analytics-configurator/-/merge_requests/34+s
- https://gitlab.com/gitlab-org/analytics-section/product-analytics/analytics-stack/-/merge_requests/135+s
- Add updated sessions and returning users cubes ... (gitlab-org/analytics-section/product-analytics/devkit!92 - merged) • Max Woolf • 16.6
- Update audience dashboard and viz designer to s... (!134426 - merged) • Robert Hunt, Max Woolf • 16.6
-
Update cube to use new sessions materialized view
Next steps (not this issue)
- Determine strategy to migrate old events data in to sessions materialised view.(https://gitlab.com/gitlab-org/analytics-section/product-analytics/analytics-stack/-/issues/79+)
Acceptance criteria/Measures of success
- All known queries using > 100GB of RAM today use less, up to 50% less
- All known queries using > 100GB of RAM today run 25% faster
- Zero queries in the built-in Audience dashboard should fail for
gitlab-org/gitlab
.
Edited by Max Woolf