[SaaS] Seat overage email schedules excessive number of jobs and API requests
This is a follow up from the discussion here: https://gitlab.com/gitlab-org/gitlab/-/issues/348487#note_900886862
Problem
Currently GitLab is making an API request to CustomersDot ~40K times daily to notify the customers app that a new member has been added to a group that has exceeded the number of purchased seats. This is happening in the GitlabSubscriptions::NotifySeatsExceededWorker
. Only ~100 of these requests actually result in an email being sent to the group owners.
This many network requests and jobs may impact performance or delay other jobs.
This problem lead to the Sidekiq queues in CustomersDot getting inundated with jobs, SendSeatOverageNotificationJob continually enqu... (customers-gitlab-com#4934 - closed). In particular, we saw that one namespace had thousands of jobs and continued to enqueue more every few seconds. We put in a few quick fix (https://gitlab.com/gitlab-org/customers-gitlab-com/-/merge_requests/5581) to prevent requests for this namespace from enqueuing more Reconciliations::SendSeatOverageNotificationJob
. We also added a CDot feature flag, block_seat_overage_notification
, to easily enable this blocking logic. Once this issue is resolved, we should be able to roll back this logic in CDot and remove the feature flag as part of customers-gitlab-com#4948 (closed).
Proposal
We should flag the groups that have recently exceeded their purchased seat allowance and use a scheduled job to send this information to CustomersDot in batches, rather than individually.
A potential solution would be:
-
Create a Subscription#max_seats_used_changed_at
field, with an index (being implemented in: !84913 (merged)) -
Update the UpdateMaxSeatsUsedForGitlabComSubscriptionsWorker
and the GitlabSubscription model callbacks to update the new field when themax_seats_used
value changes (being implemented in: !84913 (merged)) -
Update subscription renewal process on GitLab to reset the new column (being implemented in: !84913 (merged)) -
Update the CustomersDot GraphQL endpoint to support batches of notifications https://gitlab.com/gitlab-org/customers-gitlab-com/-/merge_requests/5677 -
Create a scheduled job that looks for all active subscriptions that have max_seats_used_changed_at
in the last 24 (or 48?) hours set and batch send these to CustomersDot- Add index on the
max_seats_used_changed_at
column to support this query
- Add index on the
-
Stop responding to the event from the member creation service
Note: a more in-depth discussion around this issue started here https://gitlab.com/gitlab-org/gitlab/-/issues/348487#note_900886862 if more context is needed.