Skip to content

Gitlab Event Store: pass only the required fields to the subscriber worker

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

From !98373 (comment 1117239660):

@kassio

We could have different events from the same Namespace, so, instead of PagesDomain::PagesDomainUpdatedEvent we could Pages::InternalAPICacheInvalidation. But, this way I think we would have way more specific events an many times similar events for different actions. I'm not sure this would be better. 🤷

I don't think that we should remove the attributes from the event payload. But I think the scheduling of the job should only pass on what is required for that particular job. In pseudocode:

diff --git a/lib/gitlab/event_store/subscriber.rb b/lib/gitlab/event_store/subscriber.rb
index da95d3cfcfa8..4b06c1b6a253 100644
--- a/lib/gitlab/event_store/subscriber.rb
+++ b/lib/gitlab/event_store/subscriber.rb
@@ -22,6 +22,12 @@ module Subscriber
       included do
         include ApplicationWorker
 
+        attr_reader :required_attributes
+
+        def requires_attributes(attrs)
+          @required_attributes = attrs
+        end
+
         loggable_arguments 0, 1
         idempotent!
       end
diff --git a/lib/gitlab/event_store/subscription.rb b/lib/gitlab/event_store/subscription.rb
index 01986355d2d8..f391baf18576 100644
--- a/lib/gitlab/event_store/subscription.rb
+++ b/lib/gitlab/event_store/subscription.rb
@@ -13,7 +13,8 @@ def initialize(worker, condition)
       def consume_event(event)
         return unless condition_met?(event)
 
-        worker.perform_async(event.class.name, event.data.deep_stringify_keys)
+        required_attributes = worker.required_attributes || event.data.keys # Not sure we want the fallback
+        worker.perform_async(event.class.name, event.data.slice(*required_attributes).deep_stringify_keys)
 
         # We rescue and track any exceptions here because we don't want to
         # impact other subscribers if one is faulty.

Which would mean the subscriber could be implemented as follows:

class PagesDomains::InvalidateCacheWorker
  includes Gitlab::EventStore::Subscriber

  requires_attributes [:project_id, :domain] # Or whatever it actually needs
end

This way, the event can have all of the properties that are needed for any kind of subscriber, while the subscriber knows which of those properties it needs to do it's work. With this, we would avoid the properties it does not need to make the roundtrip taking up space in Redis, time to serialize and deserialize and so forth. What do you think?

I think we can work on this orthogonal to the cache invalidation, we already have guardrails in place to limit the amount of data that roundtrips for Sidekiq, and we compress the data that does (Gitlab::SidekiqMiddleware::SizeLimiter::Validator). But for efficiency's sake, I think we should consider it as the EventStore gets used more. @fabiopitino what do you think?

Edited by 🤖 GitLab Bot 🤖