Adds migration for sessions partition table
About:
This aims to partition an existing sessions table for projects that've been onboarded to product analytics (partition by YYYYMM). This change aims to achieve better query performance for the PA dashboards. This also adds a migration rollback command to rollback the last executed migration on projects.
Strategy:
- List all the databases from activeApps table. For every project onboarded to product analytics, there is a corresponding database.
- Create a partitioned sessions table for all those databases, if the table doesn't already exist. (some of the newly onboarded projects might already have this table, ref !9 (merged))
- Create an additional materialized view from
snowplow_events
tosessions_partition
table. This will ensure any new writes tosessions
table is also propagated tosessions_partition
without incurring a dependency onsessions
table and view so that we can deprecate thesessions
table and view in future without the need to change source forsessions_partition
. - A migration to copy existing data from
sessions
tosessions_partition
- How and when to run this query is TBD
Steps to setup and validate locally
-
Select any project database (project should be gitlab project already onboarded to product analytics), preferably the one having some entries in
sessions
andsnowplow_events
table) -
Run the migration manually by following steps in README.md.
-
Validate that a partition table for
sessions
namedsessions_partition
is created for the project.SHOW TABLES from gitlab_project_#{project_id}
. -
Validate that a new materialized view to
sessions_partition
is also created. Above command foe showing tables also lists materialized views. -
Try generating an event for that project database or enter some data into
snowplow_events
. Goal is to create some entries insnowplow_events
table. E.g.INSERT INTO gitlab_project_#{id}.snowplow_events SELECT * FROM gitlab_project_#{id}.snowplow_events limit 10
-
Validate that the entry is being saved to the correct partition of
sessions_partition
table. To show all the partitions run the below command.select count() from system.parts where table in ('sessions_partition') and active
-
Validate that the entry is also being saved to original
sessions
table.
Closes #15 (closed)