[Spike] Pipe events directly from Vector into project specific database
Problem
Currently, Vector is piping events into one main table, the snowplow_queue. This is a Null table. snowplow_consumer materialized views are listening on this table to move the events into project specific snowplow_events tables. See this diagram:
flowchart LR
Vector
subgraph Clickhouse
subgraph default_db
snowplow_queue
end
subgraph project_db
snowplow_events
end
end
Vector-->snowplow_queue
snowplow_queue-->|via snowplow_consumer|snowplow_events
This means for x number of projects there are x materialized views listening on the same Null table. This can become an insert performance issue as mentioned in https://double.cloud/blog/posts/2022/12/performance-impact-of-materialized-views-in-clickhouse/.
Desired Outcome
We investigate whether a solution that directly inserts into a database specific to a project / app_id would be possible.
Potential Solution
- Use Vectors Remap Language to transfer Snowplow's TSV into JSON
- Use the Vector's template syntax on the database field of our sink.