[Spike] Pipe events directly from Vector into project specific database

Problem

Currently, Vector is piping events into one main table, the snowplow_queue. This is a Null table. snowplow_consumer materialized views are listening on this table to move the events into project specific snowplow_events tables. See this diagram:

flowchart LR
    Vector
    subgraph Clickhouse
      subgraph default_db
        snowplow_queue
      end
      subgraph project_db
        snowplow_events
      end
    end
   
   Vector-->snowplow_queue
   snowplow_queue-->|via snowplow_consumer|snowplow_events

This means for x number of projects there are x materialized views listening on the same Null table. This can become an insert performance issue as mentioned in https://double.cloud/blog/posts/2022/12/performance-impact-of-materialized-views-in-clickhouse/.

Desired Outcome

We investigate whether a solution that directly inserts into a database specific to a project / app_id would be possible.

Potential Solution

  1. Use Vectors Remap Language to transfer Snowplow's TSV into JSON
  2. Use the Vector's template syntax on the database field of our sink.