Add new parallel snowplow destination to support database events
Background
Data team is planning to set up and utilise their own snowplow event collection pipeline to track every interaction with gitlab.com database. That means that GitLab system needs to reports into two snowplow collection endpoints
Goals
- Modify existing event tracking library to be able to report int two collectors.
Gitlab::Tracking#event
should be able to select snowplow destionation in the runtime, with the default behaviour being report to Product Intelligence tracking pipeline. Alternatively there might be separate method created inGitlab::Tracking
that would report to new endpoint. - Replace
Gitlat::Tracking#track
call at https://gitlab.com/gitlab-org/gitlab/-/blob/6aa3b620a8214f733f3d0acd9bd86384b00d9f84/app/models/concerns/database_event_tracking.rb#L33 with new method from point 1
Implementation tips
Following diff demonstrates PoC changes that was used to check if Snowplow can report into two endpoints with out an issue. Some bits of this code might be reused for sake of this issue
diff --git a/lib/gitlab/tracking.rb b/lib/gitlab/tracking.rb
index 45f836f10d3a..ed65595bfca2 100644
--- a/lib/gitlab/tracking.rb
+++ b/lib/gitlab/tracking.rb
@@ -12,7 +12,7 @@ def event(category, action, label: nil, property: nil, value: nil, context: [],
action = action.to_s
- tracker.event(category, action, label: label, property: property, value: value, context: contexts)
+ trackers.each { |t| t.event(category, action, label: label, property: property, value: value, context: contexts) }
rescue StandardError => error
Gitlab::ErrorTracking.track_and_raise_for_dev_exception(error, snowplow_category: category, snowplow_action: action)
end
@@ -55,6 +55,10 @@ def tracker
Gitlab::Tracking::Destinations::Snowplow.new
end
end
+
+ def trackers
+ @trackers ||= [Gitlab::Tracking::Destinations::SnowplowMicro.new, Gitlab::Tracking::Destinations::Snowplow.new]
+ end
end
end
end
diff --git a/lib/gitlab/tracking/destinations/snowplow.rb b/lib/gitlab/tracking/destinations/snowplow.rb
index fd877bc01378..a75b98e914e9 100644
--- a/lib/gitlab/tracking/destinations/snowplow.rb
+++ b/lib/gitlab/tracking/destinations/snowplow.rb
@@ -40,10 +40,12 @@ def options(group)
def enabled?
Gitlab::CurrentSettings.snowplow_enabled?
+ true
end
def hostname
Gitlab::CurrentSettings.snowplow_collector_hostname
+ "webhook.site/5c4e5edc-d948-4a08-81ec-1e66fe6a7621"
end
private
@@ -60,11 +62,15 @@ def cookie_domain
Gitlab::CurrentSettings.snowplow_cookie_domain
end
+ def snowplow_namespace
+ SNOWPLOW_NAMESPACE
+ end
+
def tracker
@tracker ||= SnowplowTracker::Tracker.new(
emitters: [emitter],
subject: SnowplowTracker::Subject.new,
- namespace: SNOWPLOW_NAMESPACE,
+ namespace: snowplow_namespace,
app_id: app_id
)
end
diff --git a/lib/gitlab/tracking/destinations/snowplow_micro.rb b/lib/gitlab/tracking/destinations/snowplow_micro.rb
index 09480f261064..049ab16685bf 100644
--- a/lib/gitlab/tracking/destinations/snowplow_micro.rb
+++ b/lib/gitlab/tracking/destinations/snowplow_micro.rb
@@ -6,8 +6,9 @@ module Destinations
class SnowplowMicro < Snowplow
include ::Gitlab::Utils::StrongMemoize
extend ::Gitlab::Utils::Override
+ SNOWPLOW_NAMESPACE = 'gl_mic'
- DEFAULT_URI = 'http://localhost:9090'
+ DEFAULT_URI = "https://webhook.site/2776a6ae-38a4-46b1-90e4-bed6a6d9a0bb" #'http://localhost:9090'
override :options
def options(group)
@@ -25,7 +26,11 @@ def enabled?
override :hostname
def hostname
- "#{uri.host}:#{uri.port}"
+ "webhook.site/2776a6ae-38a4-46b1-90e4-bed6a6d9a0bb"
+ end
+
+ def snowplow_namespace
+ SNOWPLOW_NAMESPACE
end
def uri
@@ -53,6 +58,7 @@ def base_uri
url = Gitlab.config.snowplow_micro.address
scheme = Gitlab.config.gitlab.https ? 'https' : 'http'
"#{scheme}://#{url}"
+ DEFAULT_URI
rescue Settingslogic::MissingSetting
DEFAULT_URI
end
(END)
Edited by Mikołaj Wawrzyniak